Skip to content

DerivaModel

The DerivaModel class provides schema introspection and manipulation capabilities for Deriva catalogs. It handles table relationships, associations, and catalog structure management.

Model module for DerivaML.

This module provides catalog and database model classes, plus annotation builders. Schema/data infrastructure that used to live here (SchemaBuilder, DataLoader, DataSource, etc.) now lives upstream in :mod:deriva.bag; import from there directly.

Key components: - DerivaModel: Schema analysis utilities - DatabaseModel: SQLite database from BDBag - DerivaMLBagView: deriva-ml-domain view over a DatabaseModel

Lazy imports are used for DatabaseModel and DerivaMLBagView to avoid circular imports with the dataset module.

Aggregate

Bases: str, Enum

Aggregation functions for pseudo-columns.

Used when a pseudo-column follows an inbound foreign key and returns multiple values that need to be aggregated.

Attributes:

Name Type Description
MIN

Minimum value

MAX

Maximum value

CNT

Count of values

CNT_D

Count of distinct values

ARRAY

Array of all values

ARRAY_D

Array of distinct values

Example

pc = PseudoColumn( # doctest: +SKIP ... source=[InboundFK("domain", "Sample_Subject_fkey"), "RID"], ... aggregate=Aggregate.CNT, ... markdown_name="Sample Count" ... )

Get distinct values as array

pc = PseudoColumn( # doctest: +SKIP ... source=[InboundFK("domain", "Tag_Item_fkey"), "Name"], ... aggregate=Aggregate.ARRAY_D, ... markdown_name="Tags" ... )

Source code in src/deriva_ml/model/annotations.py
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
class Aggregate(str, Enum):
    """Aggregation functions for pseudo-columns.

    Used when a pseudo-column follows an inbound foreign key and returns
    multiple values that need to be aggregated.

    Attributes:
        MIN: Minimum value
        MAX: Maximum value
        CNT: Count of values
        CNT_D: Count of distinct values
        ARRAY: Array of all values
        ARRAY_D: Array of distinct values

    Example:
        >>> # Count related records
        >>> pc = PseudoColumn(  # doctest: +SKIP
        ...     source=[InboundFK("domain", "Sample_Subject_fkey"), "RID"],
        ...     aggregate=Aggregate.CNT,
        ...     markdown_name="Sample Count"
        ... )
        >>>
        >>> # Get distinct values as array
        >>> pc = PseudoColumn(  # doctest: +SKIP
        ...     source=[InboundFK("domain", "Tag_Item_fkey"), "Name"],
        ...     aggregate=Aggregate.ARRAY_D,
        ...     markdown_name="Tags"
        ... )
    """

    MIN = "min"
    MAX = "max"
    CNT = "cnt"
    CNT_D = "cnt_d"
    ARRAY = "array"
    ARRAY_D = "array_d"

ArrayUxMode

Bases: str, Enum

Display modes for array values in pseudo-columns.

Controls how arrays of values are rendered in the UI.

Attributes:

Name Type Description
RAW

Raw array display

CSV

Comma-separated values

OLIST

Ordered (numbered) list

ULIST

Unordered (bulleted) list

Example

pc = PseudoColumn( # doctest: +SKIP ... source=[InboundFK("domain", "Tag_Item_fkey"), "Name"], ... aggregate=Aggregate.ARRAY, ... display=PseudoColumnDisplay(array_ux_mode=ArrayUxMode.CSV) ... )

Source code in src/deriva_ml/model/annotations.py
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
class ArrayUxMode(str, Enum):
    """Display modes for array values in pseudo-columns.

    Controls how arrays of values are rendered in the UI.

    Attributes:
        RAW: Raw array display
        CSV: Comma-separated values
        OLIST: Ordered (numbered) list
        ULIST: Unordered (bulleted) list

    Example:
        >>> pc = PseudoColumn(  # doctest: +SKIP
        ...     source=[InboundFK("domain", "Tag_Item_fkey"), "Name"],
        ...     aggregate=Aggregate.ARRAY,
        ...     display=PseudoColumnDisplay(array_ux_mode=ArrayUxMode.CSV)
        ... )
    """

    RAW = "raw"
    CSV = "csv"
    OLIST = "olist"
    ULIST = "ulist"

ColumnDisplay dataclass

Bases: AnnotationBuilder

Column-display annotation builder.

Controls how column values are rendered.

Example

cd = ColumnDisplay() # doctest: +SKIP cd.default(ColumnDisplayOptions( # doctest: +SKIP ... pre_format=PreFormat(format="%.2f") ... ))

cd = ColumnDisplay() # doctest: +SKIP cd.default(ColumnDisplayOptions( # doctest: +SKIP ... markdown_pattern="Link" ... ))

Source code in src/deriva_ml/model/annotations.py
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
@dataclass
class ColumnDisplay(AnnotationBuilder):
    """Column-display annotation builder.

    Controls how column values are rendered.

    Example:
        >>> cd = ColumnDisplay()  # doctest: +SKIP
        >>> cd.default(ColumnDisplayOptions(  # doctest: +SKIP
        ...     pre_format=PreFormat(format="%.2f")
        ... ))
        >>>
        >>> # Markdown link
        >>> cd = ColumnDisplay()  # doctest: +SKIP
        >>> cd.default(ColumnDisplayOptions(  # doctest: +SKIP
        ...     markdown_pattern="[Link]({{{_value}}})"
        ... ))
    """

    tag = TAG_COLUMN_DISPLAY

    _contexts: dict[str, ColumnDisplayOptions | str] = field(default_factory=dict)

    def set_context(self, context: str, options: ColumnDisplayOptions | str) -> "ColumnDisplay":
        """Set options for a context."""
        self._contexts[context] = options
        return self

    def default(self, options: ColumnDisplayOptions) -> "ColumnDisplay":
        """Set default options."""
        return self.set_context(CONTEXT_DEFAULT, options)

    def compact(self, options: ColumnDisplayOptions) -> "ColumnDisplay":
        """Set options for compact view."""
        return self.set_context(CONTEXT_COMPACT, options)

    def detailed(self, options: ColumnDisplayOptions) -> "ColumnDisplay":
        """Set options for detailed view."""
        return self.set_context(CONTEXT_DETAILED, options)

    def to_dict(self) -> dict[str, Any]:
        result = {}
        for context, options in self._contexts.items():
            if isinstance(options, str):
                result[context] = options
            else:
                result[context] = options.to_dict()
        return result

compact

compact(
    options: ColumnDisplayOptions,
) -> "ColumnDisplay"

Set options for compact view.

Source code in src/deriva_ml/model/annotations.py
1133
1134
1135
def compact(self, options: ColumnDisplayOptions) -> "ColumnDisplay":
    """Set options for compact view."""
    return self.set_context(CONTEXT_COMPACT, options)

default

default(
    options: ColumnDisplayOptions,
) -> "ColumnDisplay"

Set default options.

Source code in src/deriva_ml/model/annotations.py
1129
1130
1131
def default(self, options: ColumnDisplayOptions) -> "ColumnDisplay":
    """Set default options."""
    return self.set_context(CONTEXT_DEFAULT, options)

detailed

detailed(
    options: ColumnDisplayOptions,
) -> "ColumnDisplay"

Set options for detailed view.

Source code in src/deriva_ml/model/annotations.py
1137
1138
1139
def detailed(self, options: ColumnDisplayOptions) -> "ColumnDisplay":
    """Set options for detailed view."""
    return self.set_context(CONTEXT_DETAILED, options)

set_context

set_context(
    context: str,
    options: ColumnDisplayOptions | str,
) -> "ColumnDisplay"

Set options for a context.

Source code in src/deriva_ml/model/annotations.py
1124
1125
1126
1127
def set_context(self, context: str, options: ColumnDisplayOptions | str) -> "ColumnDisplay":
    """Set options for a context."""
    self._contexts[context] = options
    return self

ColumnDisplayOptions dataclass

Options for displaying a column in a specific context.

Parameters:

Name Type Description Default
pre_format PreFormat | None

Pre-formatting options

None
markdown_pattern str | None

Template for rendering

None
template_engine TemplateEngine | None

Template engine to use

None
column_order list[SortKey] | Literal[False] | None

Sort order, or False to disable

None
Source code in src/deriva_ml/model/annotations.py
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
@dataclass
class ColumnDisplayOptions:
    """Options for displaying a column in a specific context.

    Args:
        pre_format: Pre-formatting options
        markdown_pattern: Template for rendering
        template_engine: Template engine to use
        column_order: Sort order, or False to disable
    """

    pre_format: PreFormat | None = None
    markdown_pattern: str | None = None
    template_engine: TemplateEngine | None = None
    column_order: list[SortKey] | Literal[False] | None = None

    def to_dict(self) -> dict[str, Any]:
        result = {}
        if self.pre_format is not None:
            result["pre_format"] = self.pre_format.to_dict()
        if self.markdown_pattern is not None:
            result["markdown_pattern"] = self.markdown_pattern
        if self.template_engine is not None:
            result["template_engine"] = self.template_engine.value
        if self.column_order is not None:
            if self.column_order is False:
                result["column_order"] = False
            else:
                result["column_order"] = [k.to_dict() if isinstance(k, SortKey) else k for k in self.column_order]
        return result

DerivaModel

Augmented interface to deriva model class.

This class provides a number of DerivaML specific methods that augment the interface in the deriva model class.

Attributes:

Name Type Description
model

ERMRest model for the catalog.

catalog ErmrestCatalog

ERMRest catalog for the model.

hostname

Hostname of the ERMRest server.

ml_schema

The ML schema name for the catalog.

domain_schemas

Frozenset of all domain schema names in the catalog.

default_schema

The default schema for table creation operations.

Source code in src/deriva_ml/model/catalog.py
  72
  73
  74
  75
  76
  77
  78
  79
  80
  81
  82
  83
  84
  85
  86
  87
  88
  89
  90
  91
  92
  93
  94
  95
  96
  97
  98
  99
 100
 101
 102
 103
 104
 105
 106
 107
 108
 109
 110
 111
 112
 113
 114
 115
 116
 117
 118
 119
 120
 121
 122
 123
 124
 125
 126
 127
 128
 129
 130
 131
 132
 133
 134
 135
 136
 137
 138
 139
 140
 141
 142
 143
 144
 145
 146
 147
 148
 149
 150
 151
 152
 153
 154
 155
 156
 157
 158
 159
 160
 161
 162
 163
 164
 165
 166
 167
 168
 169
 170
 171
 172
 173
 174
 175
 176
 177
 178
 179
 180
 181
 182
 183
 184
 185
 186
 187
 188
 189
 190
 191
 192
 193
 194
 195
 196
 197
 198
 199
 200
 201
 202
 203
 204
 205
 206
 207
 208
 209
 210
 211
 212
 213
 214
 215
 216
 217
 218
 219
 220
 221
 222
 223
 224
 225
 226
 227
 228
 229
 230
 231
 232
 233
 234
 235
 236
 237
 238
 239
 240
 241
 242
 243
 244
 245
 246
 247
 248
 249
 250
 251
 252
 253
 254
 255
 256
 257
 258
 259
 260
 261
 262
 263
 264
 265
 266
 267
 268
 269
 270
 271
 272
 273
 274
 275
 276
 277
 278
 279
 280
 281
 282
 283
 284
 285
 286
 287
 288
 289
 290
 291
 292
 293
 294
 295
 296
 297
 298
 299
 300
 301
 302
 303
 304
 305
 306
 307
 308
 309
 310
 311
 312
 313
 314
 315
 316
 317
 318
 319
 320
 321
 322
 323
 324
 325
 326
 327
 328
 329
 330
 331
 332
 333
 334
 335
 336
 337
 338
 339
 340
 341
 342
 343
 344
 345
 346
 347
 348
 349
 350
 351
 352
 353
 354
 355
 356
 357
 358
 359
 360
 361
 362
 363
 364
 365
 366
 367
 368
 369
 370
 371
 372
 373
 374
 375
 376
 377
 378
 379
 380
 381
 382
 383
 384
 385
 386
 387
 388
 389
 390
 391
 392
 393
 394
 395
 396
 397
 398
 399
 400
 401
 402
 403
 404
 405
 406
 407
 408
 409
 410
 411
 412
 413
 414
 415
 416
 417
 418
 419
 420
 421
 422
 423
 424
 425
 426
 427
 428
 429
 430
 431
 432
 433
 434
 435
 436
 437
 438
 439
 440
 441
 442
 443
 444
 445
 446
 447
 448
 449
 450
 451
 452
 453
 454
 455
 456
 457
 458
 459
 460
 461
 462
 463
 464
 465
 466
 467
 468
 469
 470
 471
 472
 473
 474
 475
 476
 477
 478
 479
 480
 481
 482
 483
 484
 485
 486
 487
 488
 489
 490
 491
 492
 493
 494
 495
 496
 497
 498
 499
 500
 501
 502
 503
 504
 505
 506
 507
 508
 509
 510
 511
 512
 513
 514
 515
 516
 517
 518
 519
 520
 521
 522
 523
 524
 525
 526
 527
 528
 529
 530
 531
 532
 533
 534
 535
 536
 537
 538
 539
 540
 541
 542
 543
 544
 545
 546
 547
 548
 549
 550
 551
 552
 553
 554
 555
 556
 557
 558
 559
 560
 561
 562
 563
 564
 565
 566
 567
 568
 569
 570
 571
 572
 573
 574
 575
 576
 577
 578
 579
 580
 581
 582
 583
 584
 585
 586
 587
 588
 589
 590
 591
 592
 593
 594
 595
 596
 597
 598
 599
 600
 601
 602
 603
 604
 605
 606
 607
 608
 609
 610
 611
 612
 613
 614
 615
 616
 617
 618
 619
 620
 621
 622
 623
 624
 625
 626
 627
 628
 629
 630
 631
 632
 633
 634
 635
 636
 637
 638
 639
 640
 641
 642
 643
 644
 645
 646
 647
 648
 649
 650
 651
 652
 653
 654
 655
 656
 657
 658
 659
 660
 661
 662
 663
 664
 665
 666
 667
 668
 669
 670
 671
 672
 673
 674
 675
 676
 677
 678
 679
 680
 681
 682
 683
 684
 685
 686
 687
 688
 689
 690
 691
 692
 693
 694
 695
 696
 697
 698
 699
 700
 701
 702
 703
 704
 705
 706
 707
 708
 709
 710
 711
 712
 713
 714
 715
 716
 717
 718
 719
 720
 721
 722
 723
 724
 725
 726
 727
 728
 729
 730
 731
 732
 733
 734
 735
 736
 737
 738
 739
 740
 741
 742
 743
 744
 745
 746
 747
 748
 749
 750
 751
 752
 753
 754
 755
 756
 757
 758
 759
 760
 761
 762
 763
 764
 765
 766
 767
 768
 769
 770
 771
 772
 773
 774
 775
 776
 777
 778
 779
 780
 781
 782
 783
 784
 785
 786
 787
 788
 789
 790
 791
 792
 793
 794
 795
 796
 797
 798
 799
 800
 801
 802
 803
 804
 805
 806
 807
 808
 809
 810
 811
 812
 813
 814
 815
 816
 817
 818
 819
 820
 821
 822
 823
 824
 825
 826
 827
 828
 829
 830
 831
 832
 833
 834
 835
 836
 837
 838
 839
 840
 841
 842
 843
 844
 845
 846
 847
 848
 849
 850
 851
 852
 853
 854
 855
 856
 857
 858
 859
 860
 861
 862
 863
 864
 865
 866
 867
 868
 869
 870
 871
 872
 873
 874
 875
 876
 877
 878
 879
 880
 881
 882
 883
 884
 885
 886
 887
 888
 889
 890
 891
 892
 893
 894
 895
 896
 897
 898
 899
 900
 901
 902
 903
 904
 905
 906
 907
 908
 909
 910
 911
 912
 913
 914
 915
 916
 917
 918
 919
 920
 921
 922
 923
 924
 925
 926
 927
 928
 929
 930
 931
 932
 933
 934
 935
 936
 937
 938
 939
 940
 941
 942
 943
 944
 945
 946
 947
 948
 949
 950
 951
 952
 953
 954
 955
 956
 957
 958
 959
 960
 961
 962
 963
 964
 965
 966
 967
 968
 969
 970
 971
 972
 973
 974
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
1076
1077
1078
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
class DerivaModel:
    """Augmented interface to deriva model class.

    This class provides a number of DerivaML specific methods that augment the interface in the deriva model class.

    Attributes:
        model: ERMRest model for the catalog.
        catalog: ERMRest catalog for the model.
        hostname: Hostname of the ERMRest server.
        ml_schema: The ML schema name for the catalog.
        domain_schemas: Frozenset of all domain schema names in the catalog.
        default_schema: The default schema for table creation operations.

    """

    def __init__(
        self,
        model: Model,
        ml_schema: str = ML_SCHEMA,
        domain_schemas: str | set[str] | None = None,
        default_schema: str | None = None,
    ):
        """Create and initialize a DerivaModel instance.

        This method will connect to a catalog and initialize schema configuration.
        This class is intended to be used as a base class on which domain-specific interfaces are built.

        Args:
            model: The ERMRest model for the catalog.
            ml_schema: The ML schema name.
            domain_schemas: Optional explicit set of domain schema names. If None,
                auto-detects all non-system schemas.
            default_schema: The default schema for table creation operations. If None
                and there is exactly one domain schema, that schema is used as default.
                If there are multiple domain schemas, default_schema must be specified.
        """
        self.model = model
        self.catalog: ErmrestCatalog = self.model.catalog
        self.hostname = self.catalog.deriva_server.server if isinstance(self.catalog, ErmrestCatalog) else "localhost"

        self.ml_schema = ml_schema

        # Determine domain schemas
        if domain_schemas is not None:
            if isinstance(domain_schemas, str):
                domain_schemas = {domain_schemas}
            self.domain_schemas = frozenset(domain_schemas)
        else:
            # Auto-detect all domain schemas
            self.domain_schemas = _get_domain_schemas(self.model.schemas.keys(), ml_schema)

        # Determine default schema for table creation
        if default_schema is not None:
            if default_schema not in self.domain_schemas:
                raise DerivaMLException(
                    f"default_schema '{default_schema}' is not in domain_schemas: {self.domain_schemas}"
                )
            self.default_schema = default_schema
        elif len(self.domain_schemas) == 1:
            # Single domain schema - use it as default
            self.default_schema = next(iter(self.domain_schemas))
        elif len(self.domain_schemas) == 0:
            # No domain schemas - default_schema will be None
            self.default_schema = None
        else:
            # Multiple domain schemas, no explicit default
            self.default_schema = None

    @classmethod
    def from_cached(
        cls,
        schema_dict: dict,
        *,
        catalog,
        ml_schema: str = ML_SCHEMA,
        domain_schemas: "str | set[str] | None" = None,
        default_schema: "str | None" = None,
    ) -> "DerivaModel":
        """Construct a DerivaModel from a cached ermrest /schema dict.

        No network is touched. The ``catalog`` argument is passed to
        deriva-py's ``Model(catalog, model_doc)`` constructor as the
        first positional argument; in offline mode it will be a
        :class:`~deriva_ml.core.catalog_stub.CatalogStub`, in online
        mode it is a real ``ErmrestCatalog``. ``DerivaModel.__init__``
        then reads the catalog back off ``model.catalog`` as usual.

        This replicates what ``Model.fromcatalog(catalog)`` does
        online — the online call fetches the schema dict via
        ``catalog.getCatalogSchema()`` (cached and ETag-revalidated
        by deriva-py) and passes the result to ``Model(catalog, dict)``.
        Here we pass in the already-cached dict from
        :class:`~deriva_ml.core.schema_cache.SchemaCache`.

        Args:
            schema_dict: The JSON payload from a previous
                ``catalog.getCatalogSchema()`` call (or any equivalent
                ``/schema`` GET), as persisted by ``SchemaCache``.
            catalog: The catalog object to associate with the model.
                Pass a real ``ErmrestCatalog`` online, or a
                ``CatalogStub`` offline.
            ml_schema: ML schema name (default ``"deriva-ml"``).
            domain_schemas: Optional explicit set of domain schema
                names. If None, auto-detects all non-system schemas
                from the cached dict.
            default_schema: Optional default schema name.

        Returns:
            A ``DerivaModel`` wrapping a deriva-py ``Model``
            reconstructed from the dict.

        Example:
            >>> cached = schema_cache.load(hostname, catalog_id)  # doctest: +SKIP
            >>> model = DerivaModel.from_cached(  # doctest: +SKIP
            ...     cached, catalog=catalog_stub, ml_schema="deriva-ml"
            ... )
        """
        # Model.__init__(catalog, model_doc) stores catalog as
        # self._catalog and exposes it via the .catalog property;
        # DerivaModel.__init__ then reads self.model.catalog.
        model = Model(catalog, schema_dict)
        return cls(
            model,
            ml_schema=ml_schema,
            domain_schemas=domain_schemas,
            default_schema=default_schema,
        )

    def is_system_schema(self, schema_name: str) -> bool:
        """Check if a schema is a system or ML schema.

        Args:
            schema_name: Name of the schema to check.

        Returns:
            True if the schema is a system or ML schema.

        Example:
            >>> model.is_system_schema("public")  # doctest: +SKIP
            True
            >>> model.is_system_schema("my_domain")  # doctest: +SKIP
            False
        """
        return _is_system_schema(schema_name, self.ml_schema)

    def is_domain_schema(self, schema_name: str) -> bool:
        """Check if a schema is a domain schema.

        Args:
            schema_name: Name of the schema to check.

        Returns:
            True if the schema is a domain schema.

        Example:
            >>> model.is_domain_schema("my_domain")  # doctest: +SKIP
            True
            >>> model.is_domain_schema("deriva-ml")  # doctest: +SKIP
            False
        """
        return schema_name in self.domain_schemas

    def _require_default_schema(self) -> str:
        """Get default schema, raising an error if not set.

        Returns:
            The default schema name.

        Raises:
            DerivaMLException: If default_schema is not set.
        """
        if self.default_schema is None:
            raise DerivaMLException(
                f"No default_schema set. With multiple domain schemas {self.domain_schemas}, "
                "you must either specify a default_schema when creating DerivaML or "
                "pass an explicit schema parameter to this method."
            )
        return self.default_schema

    def refresh_model(self) -> None:
        """Re-fetch the catalog model and replace ``self.model`` in place.

        Calls ``catalog.getCatalogModel()`` and rebinds the result to
        ``self.model``. Use this after a schema change (new table, column,
        or annotation) so subsequent introspection sees the current model.

        Caching note: the asset-execution-table cache
        (``_asset_execution_tables_cache``) is keyed on the *identity* of
        ``self.model``, so swapping the model out automatically invalidates it
        — the next call recomputes. The denormalize-planner cache
        (``_planner_cache``), if already built, keeps a reference to the
        previous model; if you depend on the planner reflecting a just-applied
        schema change, rebuild the instance rather than relying on
        ``refresh_model`` alone.

        Returns:
            None. Mutates ``self.model`` as a side effect.

        Example:
            >>> ml.create_vocabulary("Severity", "Lesion grade")  # doctest: +SKIP
            >>> ml.refresh_model()  # pick up the new table  # doctest: +SKIP
        """
        self.model = self.catalog.getCatalogModel()

    @property
    def chaise_config(self) -> dict[str, Any]:
        """Return the chaise configuration.

        Returns:
            The catalog-level Chaise display configuration annotation as a dict.

        Example:
            >>> cfg = model.chaise_config  # doctest: +SKIP
            >>> "navbarBrandText" in cfg  # doctest: +SKIP
            True
        """
        return self.model.chaise_config

    def get_schema_description(self, include_system_columns: bool = False) -> dict[str, Any]:
        """Return a JSON description of the catalog schema structure.

        Provides a structured representation of the domain and ML schemas including
        tables, columns, foreign keys, and relationships. Useful for understanding
        the data model structure programmatically.

        Args:
            include_system_columns: If True, include RID, RCT, RMT, RCB, RMB columns.
                Default False to reduce output size.

        Returns:
            Dictionary with schema structure:
            {
                "domain_schemas": ["schema_name1", "schema_name2"],
                "default_schema": "schema_name1",
                "ml_schema": "deriva-ml",
                "schemas": {
                    "schema_name": {
                        "tables": {
                            "TableName": {
                                "comment": "description",
                                "is_vocabulary": bool,
                                "is_asset": bool,
                                "is_association": bool,
                                "columns": [...],
                                "foreign_keys": [...],
                                "features": [...]
                            }
                        }
                    }
                }
            }

        Example:
            >>> desc = model.get_schema_description()  # doctest: +SKIP
            >>> sorted(desc["schemas"])  # doctest: +SKIP
            ['deriva-ml', 'my_domain']
            >>> desc["schemas"]["my_domain"]["tables"]["Image"]["is_asset"]  # doctest: +SKIP
            True
        """
        system_columns = {"RID", "RCT", "RMT", "RCB", "RMB"}
        result = {
            "domain_schemas": sorted(self.domain_schemas),
            "default_schema": self.default_schema,
            "ml_schema": self.ml_schema,
            "schemas": {},
        }

        # Include all domain schemas and the ML schema
        for schema_name in [*self.domain_schemas, self.ml_schema]:
            schema = self.model.schemas.get(schema_name)
            if not schema:
                continue

            schema_info = {"tables": {}}

            for table_name, table in schema.tables.items():
                # Get columns
                columns = []
                for col in table.columns:
                    if not include_system_columns and col.name in system_columns:
                        continue
                    columns.append(
                        {
                            "name": col.name,
                            "type": str(col.type.typename),
                            "nullok": col.nullok,
                            "comment": col.comment or "",
                        }
                    )

                # Get foreign keys
                foreign_keys = []
                for fk in table.foreign_keys:
                    fk_cols = [c.name for c in fk.foreign_key_columns]
                    ref_cols = [c.name for c in fk.referenced_columns]
                    foreign_keys.append(
                        {
                            "columns": fk_cols,
                            "referenced_table": f"{fk.pk_table.schema.name}.{fk.pk_table.name}",
                            "referenced_columns": ref_cols,
                        }
                    )

                # Get features if this is a domain table
                features = []
                if self.is_domain_schema(schema_name):
                    try:
                        for f in self.find_features(table):
                            features.append(
                                {
                                    "name": f.feature_name,
                                    "feature_table": f.feature_table.name,
                                }
                            )
                    except Exception as e:
                        logger.debug(f"Could not enumerate features for table {table.name}: {e}")

                table_info = {
                    "comment": table.comment or "",
                    "is_vocabulary": self.is_vocabulary(table),
                    "is_asset": self.is_asset(table),
                    "is_association": bool(self.is_association(table)),
                    "columns": columns,
                    "foreign_keys": foreign_keys,
                }
                if features:
                    table_info["features"] = features

                schema_info["tables"][table_name] = table_info

            result["schemas"][schema_name] = schema_info

        return result

    def __getattr__(self, name: str) -> Any:
        """Delegate unknown attribute access to the underlying deriva-py Model.

        Called only when ``name`` is not already an attribute of the
        ``DerivaModel`` instance (per Python's attribute resolution order),
        so explicit properties on this class — ``chaise_config``,
        ``apply``, ``catalog``, ``schemas`` (inherited via :class:`DatabaseModel`
        from :class:`deriva.bag.database.BagDatabase`) — take precedence.

        Kept as a fallback because ``self.model.<attr>`` is reached at 50+
        call sites for ``schemas``, ``annotations`` and a long tail of
        deriva-py Model attributes. Replacing each with explicit
        accessors would collide with mixins (e.g. ``BagDatabase.schemas``
        is an instance-attribute set in its ``__init__``, which a
        ``@property`` would shadow and block assignment to).
        """
        return getattr(self.model, name)

    def name_to_table(self, table: TableInput) -> Table:
        """Return the table object corresponding to the given table name.

        Searches domain schemas first (in sorted order), then ML schema, then WWW.
        If the table name appears in more than one schema, returns the first match.

        Args:
          table: A ERMRest table object or a string that is the name of the table.

        Returns:
          Table object.

        Raises:
          DerivaMLTableNotFound: If the table doesn't exist in any searchable schema.

        Example:
            >>> image = model.name_to_table("Image")  # doctest: +SKIP
            >>> image.name  # doctest: +SKIP
            'Image'
        """
        if isinstance(table, Table):
            return table

        # Search domain schemas (sorted for deterministic order), then ML schema, then WWW
        search_order = [*sorted(self.domain_schemas), self.ml_schema, "WWW"]
        for sname in search_order:
            if sname not in self.model.schemas:
                continue
            s = self.model.schemas[sname]
            if table in s.tables:
                return s.tables[table]
        raise DerivaMLTableNotFound(str(table), msg="Table doesn't exist in any searchable schema")

    def is_vocabulary(self, table: TableInput) -> bool:
        """Check if a given table is a controlled vocabulary table.

        Delegates to ``Table.is_vocabulary()`` in deriva-py, which enforces both
        the required column names AND their types (ermrest_curie, ermrest_uri,
        text, markdown). The type check is stricter than a column-name-only
        check — a table with an ``ID`` column of the wrong type correctly
        returns False here where the legacy name-only implementation would
        have returned True.

        Mirrors :meth:`is_asset`, which already delegates to ``Table.is_asset()``.

        Args:
            table: An ERMrest Table object or the name of the table.

        Returns:
            True if the table has the structure of a controlled vocabulary,
            False otherwise.

        Raises:
            DerivaMLTableNotFound: If the table doesn't exist in any searchable
                schema (raised by :meth:`name_to_table`).

        Example:
            >>> model.is_vocabulary("Image_Class")  # doctest: +SKIP
            True
            >>> model.is_vocabulary("Image")  # doctest: +SKIP
            False
        """
        table = self.name_to_table(table)
        return table.is_vocabulary()

    def vocab_columns(self, table: TableInput) -> dict[str, str]:
        """Return mapping from canonical vocab column name to actual column name.

        Canonical names are TitleCase (Name, ID, URI, Description, Synonyms).
        Actual names reflect the table's schema — could be lowercase for
        FaceBase-style catalogs or TitleCase for DerivaML-native tables.

        Args:
            table: A table object or the name of the table.

        Returns:
            Dict mapping canonical name to actual column name in the table.
            E.g. ``{"Name": "name", "ID": "id", ...}`` for FaceBase tables
            or ``{"Name": "Name", "ID": "ID", ...}`` for DerivaML tables.

        Raises:
            DerivaMLTableNotFound: If the table doesn't exist (raised by
                :meth:`name_to_table`).

        Example:
            >>> model.vocab_columns("Image_Class")  # doctest: +SKIP
            {'Name': 'Name', 'ID': 'ID', 'URI': 'URI', 'Description': 'Description', 'Synonyms': 'Synonyms'}
        """
        table = self.name_to_table(table)
        col_map = {c.name.upper(): c.name for c in table.columns}
        return {canon: col_map[canon.upper()] for canon in ("Name", "ID", "URI", "Description", "Synonyms")}

    def is_association(
        self,
        table: TableInput,
        unqualified: bool = True,
        pure: bool = True,
        min_arity: int = 2,
        max_arity: int = 2,
    ) -> bool | set[str] | int:
        """Check whether ``table`` is an association (linking) table.

        Delegates to :meth:`deriva.core.ermrest_model.Table.is_association`.
        An association table mediates a many-to-many relationship between
        two (or more) tables via outbound FKs to each end.

        Args:
            table: Table name or :class:`Table` to inspect.
            unqualified: Per deriva-py — if True, the returned column set
                uses bare column names (no schema/table qualification).
                Only consulted when the return mode is the column-name set.
            pure: If True, require a *pure* association — no extra payload
                columns beyond the FK columns and system metadata (RID,
                RCT, RMT, RCB, RMB). Excludes feature tables, which carry
                their own non-FK columns.
            min_arity: Minimum number of outbound FKs that count as
                "associating." Defaults to 2 (a binary association).
            max_arity: Maximum number of outbound FKs. Defaults to 2.

        Returns:
            ``bool`` when the question is "is this *any* association at the
            requested arity," or ``set[str]`` / ``int`` when deriva-py's
            ``is_association`` returns the structural detail set instead.
            See :meth:`Table.is_association` for the full contract.

        Raises:
            DerivaMLTableNotFound: If ``table`` doesn't exist in any searchable
                schema (raised by :meth:`name_to_table`).

        Example:
            >>> bool(model.is_association("Dataset_Image"))  # doctest: +SKIP
            True
            >>> bool(model.is_association("Image"))  # doctest: +SKIP
            False
        """
        table = self.name_to_table(table)
        return table.is_association(unqualified=unqualified, pure=pure, min_arity=min_arity, max_arity=max_arity)

    def find_association(self, table1: TableInput, table2: TableInput) -> tuple[Table, str, str]:
        """Return the unique association table linking ``table1`` and ``table2``.

        Searches all associations on ``table1`` for one whose other-side
        FK lands on ``table2``. The result lets callers JOIN through the
        link without re-deriving the column names by hand.

        Args:
            table1: Either endpoint of the association. Table name or
                :class:`Table`.
            table2: The other endpoint. Table name or :class:`Table`.

        Returns:
            ``(assoc_table, table1_link_column, table2_link_column)``
            — the association :class:`Table` itself plus the *names* (as
            ``str``) of the two FK columns on it (one referencing
            ``table1``, one referencing ``table2``). The column names are
            returned as strings because every caller uses them directly
            as ``datapath`` ``.columns[...]`` keys or insert-row dict keys.

        Raises:
            NoAssociationException: If no association table connects the
                two tables. Callers that legitimately handle the "no link"
                case (e.g. probing whether an asset table is tracked
                through ``Execution``) should catch this specific subclass
                rather than the broader :class:`DerivaMLException`.
            AmbiguousAssociationException: If multiple association tables
                connect the two tables. The caller must disambiguate by
                naming the desired association table directly.

        Example:
            >>> assoc, c1, c2 = model.find_association("Dataset", "Image")  # doctest: +SKIP
            >>> assoc.name, c1, c2  # doctest: +SKIP
            ('Dataset_Image', 'Dataset', 'Image')
        """
        table1 = self.name_to_table(table1)
        table2 = self.name_to_table(table2)

        tables = [
            (a.table, a.self_fkey.columns[0].name, other_key.columns[0].name)
            for a in table1.find_associations(pure=False)
            if len(a.other_fkeys) == 1 and (other_key := a.other_fkeys.pop()).pk_table == table2
        ]

        if len(tables) == 1:
            return tables[0]
        elif len(tables) == 0:
            raise NoAssociationException(table1.name, table2.name)
        else:
            raise AmbiguousAssociationException(table1.name, table2.name, len(tables))

    def is_asset(self, table: TableInput) -> bool:
        """Check whether ``table`` is a proper asset table.

        Delegates to :meth:`Table.is_asset` from deriva-py, which verifies:

        - Required columns exist (``URL``, ``Filename``, ``Length``, ``MD5``).
        - ``URL``, ``Length``, ``MD5`` are NOT NULL.
        - ``URL`` carries the ``asset`` annotation.

        Args:
            table: Table name or :class:`Table` to inspect.

        Returns:
            True if all asset-table requirements are satisfied.

        Raises:
            DerivaMLTableNotFound: If ``table`` doesn't exist in any searchable
                schema (raised by :meth:`name_to_table`).

        Example:
            >>> model.is_asset("Image")  # doctest: +SKIP
            True
            >>> model.is_asset("Subject")  # doctest: +SKIP
            False
        """
        table = self.name_to_table(table)
        return table.is_asset()

    def find_asset_execution_tables(self) -> list[tuple[str, str]]:
        """Return the ``*_Execution`` association tables across all schemas.

        Walks every domain + ML schema once, finds tables whose
        name ends with ``_Execution``, and caches the result on
        the instance. Subsequent calls re-use the cache so callers
        that walk these tables repeatedly (e.g.
        :func:`~deriva_ml.execution._helpers.list_assets`) pay
        the schema-iteration cost exactly once per
        :class:`DerivaModel` lifetime.

        Two ``*_Execution`` tables are **excluded** because they're
        not asset-to-execution association tables despite the
        suffix:

        - ``Dataset_Execution`` — dataset linkage; consumed by
          :func:`~deriva_ml.execution._helpers.list_input_datasets`.
        - ``Execution_Execution`` — nested-execution hierarchy
          (parent/child); has no ``Asset_Role`` column. Hitting
          it during the ``list_assets(asset_role=...)`` walk
          produces an ``AttributeError`` ("no such column
          ``Asset_Role``") rather than just returning zero
          matches, so the exclusion is correctness-critical.

        The cache is invalidated whenever the underlying model
        object identity changes (e.g. after a catalog
        ``Model.fromcatalog`` refetch). In practice the model is
        only refreshed in long-lived sessions that mutate schema
        — the common case (read-mostly scripts) hits the cache
        on every call after the first.

        Returns:
            List of ``(schema_name, table_name)`` pairs, ordered
            by schema-then-table for deterministic iteration.

        Example:
            >>> from deriva_ml.model.catalog import DerivaModel  # doctest: +SKIP
            >>> model.find_asset_execution_tables()  # doctest: +SKIP
            [('deriva-ml', 'Execution_Asset_Execution'),
             ('test_schema', 'Image_Execution')]
        """
        # Cache key is the underlying model object's identity —
        # a refetch swaps it out and we recompute. Cheap, safe.
        cached = getattr(self, "_asset_execution_tables_cache", None)
        if cached is not None and cached[0] is self.model:
            return cached[1]

        result: list[tuple[str, str]] = []
        schemas_to_search = [*sorted(self.domain_schemas), self.ml_schema]
        for schema_name in schemas_to_search:
            schema_obj = self.model.schemas.get(schema_name)
            if schema_obj is None:
                continue
            for table in schema_obj.tables.values():
                if not table.name.endswith("_Execution"):
                    continue
                # ``Dataset_Execution`` is the dataset linkage; not an
                # asset association. ``Execution_Execution`` is the
                # nested-execution parent/child table; has no
                # ``Asset_Role`` column so it'd crash any
                # ``list_assets(asset_role=...)`` walk.
                if table.name in ("Dataset_Execution", "Execution_Execution"):
                    continue
                result.append((schema_name, table.name))

        self._asset_execution_tables_cache = (self.model, result)
        return result

    def find_assets(self) -> list[Table]:
        """Return the list of asset tables in the current model.

        Returns:
            All tables across every schema that satisfy :meth:`is_asset`.

        Example:
            >>> [t.name for t in model.find_assets()]  # doctest: +SKIP
            ['Image', 'Execution_Asset']
        """
        return [t for s in self.model.schemas.values() for t in s.tables.values() if self.is_asset(t)]

    def find_vocabularies(self) -> list[Table]:
        """Return a list of all controlled vocabulary tables in domain and ML schemas.

        Returns:
            All tables in the domain and ML schemas that satisfy
            :meth:`is_vocabulary`.

        Example:
            >>> [t.name for t in model.find_vocabularies()]  # doctest: +SKIP
            ['Image_Class', 'Workflow_Type']
        """
        tables = []
        for schema_name in [*self.domain_schemas, self.ml_schema]:
            schema = self.model.schemas.get(schema_name)
            if schema:
                tables.extend(t for t in schema.tables.values() if self.is_vocabulary(t))
        return tables

    @validate_call(config=VALIDATION_CONFIG)
    def find_features(self, table: TableInput | None = None) -> Iterable[Feature]:
        """List features in the catalog.

        If a table is specified, returns only features for that table.
        If no table is specified, returns all features across all tables in the catalog.

        Args:
            table: Optional table to find features for. If None, returns all features
                in the catalog.

        Returns:
            An iterable of Feature instances describing the features.

        Example:
            >>> [f.feature_name for f in model.find_features("Image")]  # doctest: +SKIP
            ['BoundingBox', 'Quality']
            >>> all_features = list(model.find_features())  # doctest: +SKIP
        """

        def is_feature(a: FindAssociationResult) -> bool:
            """Check if association represents a feature.

            Args:
                a: Association result to check
            Returns:
                bool: True if association represents a feature
            """
            return {
                "Feature_Name",
                "Execution",
                a.self_fkey.foreign_key_columns[0].name,
            }.issubset({c.name for c in a.table.columns})

        def find_table_features(t: Table) -> list[Feature]:
            """Find all features for a single table.

            ``max_arity`` is left unbounded (``None``) so that
            *key-qualified* multi-value features are discovered. A
            qualifier is a value FK that participates in the
            association table's compound uniqueness key — e.g.
            ``Image_Side`` on eye-ai's ``Execution_Subject_Chart_Label``,
            where the same Subject legitimately has a left-eye and a
            right-eye row. Such a key includes
            ``{Execution, Subject, Feature_Name, Image_Side}``, giving a
            key-FK arity of 4. The former ``max_arity=3`` cap silently
            excluded these features from discovery (and therefore from
            ``lookup_feature`` / ``feature_values``).

            ``is_feature`` remains the sole filter: it still requires the
            ``Feature_Name`` and ``Execution`` FKs plus the target FK, so
            removing the ceiling cannot admit a non-feature association
            (a plain N-way domain join lacks ``Feature_Name``).
            ``min_arity=3`` is retained as the lower bound.
            """
            return [
                Feature(a, self) for a in t.find_associations(min_arity=3, max_arity=None, pure=False) if is_feature(a)
            ]

        if table is not None:
            # Find features for a specific table
            return find_table_features(self.name_to_table(table))

        # No table arg: discover features across the whole catalog.
        #
        # ``find_associations`` walks ``Table.referenced_by`` from each
        # candidate table, so the same association table is visited
        # once per FK target. For a single ``Image.Image_Classification``
        # feature backed by ``Execution_Image_Image_Classification``
        # (an association with FKs to Image, Execution, and the
        # Image_Class vocab) the naive cross-schema scan yields three
        # Feature objects -- one with ``target_table=Image`` (the
        # actual target), one with ``target_table=Execution``, and one
        # with ``target_table=Image_Class``. Only the first is what
        # callers want. See
        # docs/bugs/2026-05-19-find-features-duplicates.md.
        #
        # The fix is twofold:
        # 1. Skip iteration over tables that can never be the actual
        #    feature target -- the Execution table and any vocabulary
        #    table. Every feature association references both, so
        #    scanning them only produces duplicates.
        # 2. Dedup the remaining list by the association table itself
        #    (qualified schema.name), in case multiple distinct target
        #    tables share an association in some non-canonical layout.
        ml_schema_obj = self.model.schemas.get(self.ml_schema)
        execution_table = ml_schema_obj.tables.get("Execution") if ml_schema_obj is not None else None

        seen_feature_tables: set[tuple[str, str]] = set()
        features: list[Feature] = []
        for schema_name in [*self.domain_schemas, self.ml_schema]:
            schema = self.model.schemas.get(schema_name)
            if schema is None:
                continue
            for t in schema.tables.values():
                if execution_table is not None and t is execution_table:
                    continue
                if self.is_vocabulary(t):
                    continue
                for f in find_table_features(t):
                    key = (f.feature_table.schema.name, f.feature_table.name)
                    if key in seen_feature_tables:
                        continue
                    seen_feature_tables.add(key)
                    features.append(f)
        return features

    def lookup_feature(self, table: TableInput, feature_name: str) -> Feature:
        """Look up the named feature on ``table``.

        Features are association tables (linking a target table to
        vocabulary terms, assets, and metadata) discovered by
        :meth:`find_features`. This is the by-name accessor.

        Args:
            table: The target table the feature is attached to. Name or
                :class:`Table`.
            feature_name: The feature's name as set in its
                ``Feature_Name`` column.

        Returns:
            The :class:`Feature` wrapper for the matching association.

        Raises:
            DerivaMLTableNotFound: If ``table`` doesn't exist.
            DerivaMLFeatureNotFound: If no feature with
                ``feature_name`` is defined on ``table``.

        Example:
            >>> feature = model.lookup_feature("Image", "Quality")  # doctest: +SKIP
            >>> feature.feature_name  # doctest: +SKIP
            'Quality'
        """
        table = self.name_to_table(table)
        try:
            return [f for f in self.find_features(table) if f.feature_name == feature_name][0]
        except IndexError:
            raise DerivaMLFeatureNotFound(table.name, feature_name) from None

    def asset_metadata(self, table: TableInput) -> set[str]:
        """Return the non-asset columns of an asset table.

        Asset tables are ``Table.is_asset()`` tables: they carry the
        standard ``URL`` / ``Filename`` / ``Length`` / ``MD5`` columns
        plus arbitrary domain-specific metadata. This method returns
        the metadata column names — i.e. everything *except* the four
        standard asset columns (kept in
        :data:`~deriva_ml.core.definitions.DerivaAssetColumns`).

        Args:
            table: The asset table — name or :class:`Table` instance.

        Returns:
            Set of metadata column names. Empty if the asset table
            carries no extra columns.

        Raises:
            DerivaMLTableTypeError: If ``table`` is not an asset table.
            DerivaMLTableNotFound: If ``table`` doesn't exist (raised by
                :meth:`name_to_table`).

        Example:
            >>> sorted(model.asset_metadata("Image"))  # doctest: +SKIP
            ['Description', 'Image_Class']
        """
        table = self.name_to_table(table)

        if not self.is_asset(table):
            raise DerivaMLTableTypeError("asset table", table.name)
        return {c.name for c in table.columns} - DerivaAssetColumns

    def asset_metadata_columns(self, table: TableInput) -> list[Column]:
        """Return Column objects for the asset-metadata columns of ``table``.

        Like :meth:`asset_metadata` but returns the :class:`Column`
        instances (not just names) so callers can inspect attributes
        such as ``nullok``. Results are sorted by column name for
        deterministic iteration.

        Args:
            table: Asset table name or Table object.

        Returns:
            Sorted list of Column objects.

        Raises:
            DerivaMLTableTypeError: If ``table`` is not an asset table.

        Example:
            >>> [c.name for c in model.asset_metadata_columns("Image")]  # doctest: +SKIP
            ['Description', 'Image_Class']
        """
        table = self.name_to_table(table)
        if not self.is_asset(table):
            raise DerivaMLTableTypeError("asset table", table.name)
        return sorted(
            (c for c in table.columns if c.name not in DerivaAssetColumns),
            key=lambda c: c.name,
        )

    def asset_metadata_sorted(self, table: TableInput) -> list[str]:
        """Return the asset-metadata column **names** in deterministic order.

        Sorted by name. Pins the alphabetic-order invariant in one
        place so call sites stay in lockstep:

        - :func:`~deriva_ml.core.upload_layout.asset_table_upload_spec`
          builds the upload regex from these names; the directory
          order in the staging tree must match the regex order.
        - :func:`~deriva_ml.execution.bag_commit._add_asset_rows_to_bag`
          emits metadata columns into the bag in the same order so
          the recorded rows align with the upload regex captures.

        Pre-extraction, each call site re-wrote
        ``sorted(model.asset_metadata(table))`` inline. Centralising
        the call shape means a future change to the ordering rule
        (e.g. case-insensitive sort, or sorted by FK target) lands
        once and everyone follows.

        Args:
            table: Asset table name or :class:`Table` instance.

        Returns:
            Sorted list of metadata column names. Empty list if
            the table carries no extra columns.

        Raises:
            DerivaMLTableTypeError: If ``table`` isn't an asset table.

        Example:
            >>> from deriva_ml.model.catalog import DerivaModel  # doctest: +SKIP
            >>> model.asset_metadata_sorted("Image")  # doctest: +SKIP
            ['Asset_Role', 'Description']
        """
        return sorted(self.asset_metadata(table))

    def apply(self) -> None:
        """Apply pending annotation/schema changes via the underlying Model.

        Thin passthrough to ``self.model.apply()``. Kept explicit so the
        annotation/schema commit boundary is visible on the DerivaModel
        public surface rather than hiding behind generic ``__getattr__``
        delegation.

        Refuses to run when ``self.catalog`` is a
        :class:`~deriva_ml.core.catalog_stub.CatalogStub` (offline mode):
        applying a schema change without a live catalog connection is
        nonsensical, and the underlying ``Model.apply()`` would otherwise
        raise an unhelpful :class:`DerivaMLReadOnlyError` once it reached
        through the stub.

        Raises:
            DerivaMLReadOnlyError: If this DerivaML instance is in offline
                mode (``self.catalog`` is a ``CatalogStub``).

        Example:
            >>> table.annotations[Display.tag] = display.to_dict()  # doctest: +SKIP
            >>> model.apply()  # commit the staged annotation  # doctest: +SKIP
        """
        if isinstance(self.catalog, CatalogStub):
            raise DerivaMLReadOnlyError(
                "DerivaModel.apply() requires online mode; this DerivaML instance was constructed with mode=offline."
            )
        self.model.apply()

    def is_dataset_rid(self, rid: RID, deleted: bool = False) -> bool:
        """Check whether ``rid`` identifies a (non-deleted) Dataset row.

        Resolves ``rid`` against the live catalog via
        :meth:`ErmrestCatalog.resolve_rid` to determine which table it
        belongs to, then verifies it's the ``Dataset`` table. By default
        deleted datasets are treated as not-a-dataset; pass ``deleted=True``
        to include tombstoned rows in the positive set.

        Args:
            rid: The RID to test.
            deleted: If True, return ``True`` for soft-deleted datasets
                too. Defaults to False (deleted rows return ``False``).

        Returns:
            True if ``rid`` is a Dataset row (filtered by the ``deleted``
            flag), False if it points at a different table.

        Raises:
            DerivaMLException: If ``rid`` doesn't resolve in the catalog
                at all (typically an invalid or fabricated RID).

        Example:
            >>> model.is_dataset_rid("1-abc123")  # doctest: +SKIP
            True
            >>> model.is_dataset_rid("1-image01")  # an Image RID  # doctest: +SKIP
            False
        """
        try:
            rid_info = self.model.catalog.resolve_rid(rid, self.model)
        except KeyError as _e:
            raise DerivaMLException(f"Invalid RID {rid}")
        if rid_info.table.name != "Dataset":
            return False
        elif deleted:
            # Got a dataset rid. Now check to see if its deleted or not.
            return True
        else:
            return not list(rid_info.datapath.entities().fetch())[0]["Deleted"]

    def list_dataset_element_types(self) -> list[Table]:
        """List the deriva-py ``Table`` types that can be dataset members.

        Walks ``Dataset.find_associations()`` and returns the
        ``other_fkey.pk_table`` for each association whose target is a
        domain-schema table or the Dataset table itself. Used by
        ``DerivaML.add_dataset_members`` to validate the kind of row
        a caller is trying to add to a dataset.

        Returns:
            A list of :class:`~deriva.core.ermrest_model.Table`
            objects — one per valid member type.

        Example:
            >>> [t.name for t in model.list_dataset_element_types()]  # doctest: +SKIP
            ['Image', 'Subject', 'Dataset']
        """

        dataset_table = self.name_to_table("Dataset")

        def is_domain_or_dataset_table(table: Table) -> bool:
            return self.is_domain_schema(table.schema.name) or table.name == dataset_table.name

        return [
            t
            for a in dataset_table.find_associations()
            if is_domain_or_dataset_table(t := a.other_fkeys.pop().pk_table)
        ]

    # ------------------------------------------------------------------
    # Denormalization planner
    #
    # The planner — schema-graph reachability + JOIN tree construction —
    # was extracted into :mod:`deriva_ml.model.denormalize_planner` in
    # Phase 3 (audit §5.2). It's a ~1100 LoC algorithm subsystem with a
    # narrow consumer set (``local_db/`` + a couple of single-line
    # sites). The split keeps :class:`DerivaModel` focused on its
    # wide-fan-out role (introspection touched by every mixin) and
    # gives the planner its own focused module.
    #
    # Access the planner via :attr:`_planner`. All planner methods are
    # underscore-prefixed because the planner is internal to the
    # denormalization subsystem; the user-facing API is
    # :class:`local_db.denormalize.Denormalizer`.
    # ------------------------------------------------------------------

    @property
    def _planner(self) -> "DenormalizePlanner":
        """Lazily-constructed :class:`DenormalizePlanner` for this model.

        Cached on the instance after first access so reachability /
        join-tree computations don't repeat the construction cost. The
        planner reads schemas/tables through ``self`` and never mutates
        the model, so the cache is safe to share. The planner itself
        isn't documented as thread-safe — callers needing concurrent
        access should construct their own ``DenormalizePlanner``
        per-thread.

        Uses a single-underscore attribute name (``_planner_cache``)
        rather than double-underscore to avoid Python's name
        mangling and keep ``hasattr`` lookups straightforward.
        """
        if not hasattr(self, "_planner_cache"):
            self._planner_cache = DenormalizePlanner(self)
        return self._planner_cache

    def create_table(self, table_def: TableDefinition, schema: str | None = None) -> Table:
        """Create a new table from TableDefinition.

        Args:
            table_def: Table definition (dataclass or dict).
            schema: Schema to create the table in. If None, uses default_schema.

        Returns:
            The newly created Table.

        Raises:
            DerivaMLException: If no schema specified and default_schema is not set.

        Note: @validate_call removed because TableDefinition is now a dataclass from
        deriva.core.typed and Pydantic validation doesn't work well with dataclass fields.

        Example:
            >>> from deriva_ml.core.definitions import TableDefinition, ColumnDefinition  # doctest: +SKIP
            >>> table_def = TableDefinition(  # doctest: +SKIP
            ...     name="Observation",
            ...     column_defs=[ColumnDefinition(name="Note", type="text")],
            ... )
            >>> new_table = model.create_table(table_def, schema="my_domain")  # doctest: +SKIP
        """
        schema = schema or self._require_default_schema()
        # Handle both TableDefinition (dataclass with to_dict) and plain dicts
        table_dict = table_def.to_dict() if hasattr(table_def, "to_dict") else table_def
        return self.model.schemas[schema].create_table(table_dict)

    def _define_association(
        self,
        associates: list,
        metadata: list | None = None,
        table_name: str | None = None,
        comment: str | None = None,
        **kwargs,
    ) -> dict:
        """Build an association table definition with vocab-aware key selection.

        Wraps Table.define_association to ensure non-vocabulary tables use RID
        as their foreign key target. The default key search heuristic in
        define_association prefers Name/ID keys over RID, which is correct for
        vocabulary tables (FK to human-readable Name) but wrong for domain
        tables that happen to have non-nullable Name or ID keys (e.g., tables
        in cloned catalogs like FaceBase).

        Args:
            associates: Reference targets being associated (Table, Key, or tuples).
            metadata: Additional metadata fields and/or reference targets.
            table_name: Name for the association table.
            comment: Comment for the association table.
            **kwargs: Additional arguments passed to Table.define_association.

        Returns:
            Table definition dict suitable for create_table.
        """
        metadata = metadata or []

        def _resolve_key(ref):
            """Convert non-vocabulary Table references to their RID Key."""
            if isinstance(ref, tuple):
                # (name, Table) or (name, nullok, Table) — resolve the Table element
                items = list(ref)
                table_obj = items[-1]
                if isinstance(table_obj, Table) and not table_obj.is_vocabulary():
                    items[-1] = table_obj.key_by_columns(["RID"])
                return tuple(items)
            elif isinstance(ref, Table) and not ref.is_vocabulary():
                return ref.key_by_columns(["RID"])
            return ref  # Key objects or vocabulary Tables pass through

        resolved_associates = [_resolve_key(a) for a in associates]
        resolved_metadata = [_resolve_key(m) for m in metadata]

        return Table.define_association(
            associates=resolved_associates,
            metadata=resolved_metadata,
            table_name=table_name,
            comment=comment,
            **kwargs,
        )

chaise_config property

chaise_config: dict[str, Any]

Return the chaise configuration.

Returns:

Type Description
dict[str, Any]

The catalog-level Chaise display configuration annotation as a dict.

Example

cfg = model.chaise_config # doctest: +SKIP "navbarBrandText" in cfg # doctest: +SKIP True

__getattr__

__getattr__(name: str) -> Any

Delegate unknown attribute access to the underlying deriva-py Model.

Called only when name is not already an attribute of the DerivaModel instance (per Python's attribute resolution order), so explicit properties on this class — chaise_config, apply, catalog, schemas (inherited via :class:DatabaseModel from :class:deriva.bag.database.BagDatabase) — take precedence.

Kept as a fallback because self.model.<attr> is reached at 50+ call sites for schemas, annotations and a long tail of deriva-py Model attributes. Replacing each with explicit accessors would collide with mixins (e.g. BagDatabase.schemas is an instance-attribute set in its __init__, which a @property would shadow and block assignment to).

Source code in src/deriva_ml/model/catalog.py
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
def __getattr__(self, name: str) -> Any:
    """Delegate unknown attribute access to the underlying deriva-py Model.

    Called only when ``name`` is not already an attribute of the
    ``DerivaModel`` instance (per Python's attribute resolution order),
    so explicit properties on this class — ``chaise_config``,
    ``apply``, ``catalog``, ``schemas`` (inherited via :class:`DatabaseModel`
    from :class:`deriva.bag.database.BagDatabase`) — take precedence.

    Kept as a fallback because ``self.model.<attr>`` is reached at 50+
    call sites for ``schemas``, ``annotations`` and a long tail of
    deriva-py Model attributes. Replacing each with explicit
    accessors would collide with mixins (e.g. ``BagDatabase.schemas``
    is an instance-attribute set in its ``__init__``, which a
    ``@property`` would shadow and block assignment to).
    """
    return getattr(self.model, name)

__init__

__init__(
    model: Model,
    ml_schema: str = ML_SCHEMA,
    domain_schemas: str
    | set[str]
    | None = None,
    default_schema: str | None = None,
)

Create and initialize a DerivaModel instance.

This method will connect to a catalog and initialize schema configuration. This class is intended to be used as a base class on which domain-specific interfaces are built.

Parameters:

Name Type Description Default
model Model

The ERMRest model for the catalog.

required
ml_schema str

The ML schema name.

ML_SCHEMA
domain_schemas str | set[str] | None

Optional explicit set of domain schema names. If None, auto-detects all non-system schemas.

None
default_schema str | None

The default schema for table creation operations. If None and there is exactly one domain schema, that schema is used as default. If there are multiple domain schemas, default_schema must be specified.

None
Source code in src/deriva_ml/model/catalog.py
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
def __init__(
    self,
    model: Model,
    ml_schema: str = ML_SCHEMA,
    domain_schemas: str | set[str] | None = None,
    default_schema: str | None = None,
):
    """Create and initialize a DerivaModel instance.

    This method will connect to a catalog and initialize schema configuration.
    This class is intended to be used as a base class on which domain-specific interfaces are built.

    Args:
        model: The ERMRest model for the catalog.
        ml_schema: The ML schema name.
        domain_schemas: Optional explicit set of domain schema names. If None,
            auto-detects all non-system schemas.
        default_schema: The default schema for table creation operations. If None
            and there is exactly one domain schema, that schema is used as default.
            If there are multiple domain schemas, default_schema must be specified.
    """
    self.model = model
    self.catalog: ErmrestCatalog = self.model.catalog
    self.hostname = self.catalog.deriva_server.server if isinstance(self.catalog, ErmrestCatalog) else "localhost"

    self.ml_schema = ml_schema

    # Determine domain schemas
    if domain_schemas is not None:
        if isinstance(domain_schemas, str):
            domain_schemas = {domain_schemas}
        self.domain_schemas = frozenset(domain_schemas)
    else:
        # Auto-detect all domain schemas
        self.domain_schemas = _get_domain_schemas(self.model.schemas.keys(), ml_schema)

    # Determine default schema for table creation
    if default_schema is not None:
        if default_schema not in self.domain_schemas:
            raise DerivaMLException(
                f"default_schema '{default_schema}' is not in domain_schemas: {self.domain_schemas}"
            )
        self.default_schema = default_schema
    elif len(self.domain_schemas) == 1:
        # Single domain schema - use it as default
        self.default_schema = next(iter(self.domain_schemas))
    elif len(self.domain_schemas) == 0:
        # No domain schemas - default_schema will be None
        self.default_schema = None
    else:
        # Multiple domain schemas, no explicit default
        self.default_schema = None

apply

apply() -> None

Apply pending annotation/schema changes via the underlying Model.

Thin passthrough to self.model.apply(). Kept explicit so the annotation/schema commit boundary is visible on the DerivaModel public surface rather than hiding behind generic __getattr__ delegation.

Refuses to run when self.catalog is a :class:~deriva_ml.core.catalog_stub.CatalogStub (offline mode): applying a schema change without a live catalog connection is nonsensical, and the underlying Model.apply() would otherwise raise an unhelpful :class:DerivaMLReadOnlyError once it reached through the stub.

Raises:

Type Description
DerivaMLReadOnlyError

If this DerivaML instance is in offline mode (self.catalog is a CatalogStub).

Example

table.annotations[Display.tag] = display.to_dict() # doctest: +SKIP model.apply() # commit the staged annotation # doctest: +SKIP

Source code in src/deriva_ml/model/catalog.py
 975
 976
 977
 978
 979
 980
 981
 982
 983
 984
 985
 986
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
def apply(self) -> None:
    """Apply pending annotation/schema changes via the underlying Model.

    Thin passthrough to ``self.model.apply()``. Kept explicit so the
    annotation/schema commit boundary is visible on the DerivaModel
    public surface rather than hiding behind generic ``__getattr__``
    delegation.

    Refuses to run when ``self.catalog`` is a
    :class:`~deriva_ml.core.catalog_stub.CatalogStub` (offline mode):
    applying a schema change without a live catalog connection is
    nonsensical, and the underlying ``Model.apply()`` would otherwise
    raise an unhelpful :class:`DerivaMLReadOnlyError` once it reached
    through the stub.

    Raises:
        DerivaMLReadOnlyError: If this DerivaML instance is in offline
            mode (``self.catalog`` is a ``CatalogStub``).

    Example:
        >>> table.annotations[Display.tag] = display.to_dict()  # doctest: +SKIP
        >>> model.apply()  # commit the staged annotation  # doctest: +SKIP
    """
    if isinstance(self.catalog, CatalogStub):
        raise DerivaMLReadOnlyError(
            "DerivaModel.apply() requires online mode; this DerivaML instance was constructed with mode=offline."
        )
    self.model.apply()

asset_metadata

asset_metadata(
    table: TableInput,
) -> set[str]

Return the non-asset columns of an asset table.

Asset tables are Table.is_asset() tables: they carry the standard URL / Filename / Length / MD5 columns plus arbitrary domain-specific metadata. This method returns the metadata column names — i.e. everything except the four standard asset columns (kept in :data:~deriva_ml.core.definitions.DerivaAssetColumns).

Parameters:

Name Type Description Default
table TableInput

The asset table — name or :class:Table instance.

required

Returns:

Type Description
set[str]

Set of metadata column names. Empty if the asset table

set[str]

carries no extra columns.

Raises:

Type Description
DerivaMLTableTypeError

If table is not an asset table.

DerivaMLTableNotFound

If table doesn't exist (raised by :meth:name_to_table).

Example

sorted(model.asset_metadata("Image")) # doctest: +SKIP ['Description', 'Image_Class']

Source code in src/deriva_ml/model/catalog.py
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
def asset_metadata(self, table: TableInput) -> set[str]:
    """Return the non-asset columns of an asset table.

    Asset tables are ``Table.is_asset()`` tables: they carry the
    standard ``URL`` / ``Filename`` / ``Length`` / ``MD5`` columns
    plus arbitrary domain-specific metadata. This method returns
    the metadata column names — i.e. everything *except* the four
    standard asset columns (kept in
    :data:`~deriva_ml.core.definitions.DerivaAssetColumns`).

    Args:
        table: The asset table — name or :class:`Table` instance.

    Returns:
        Set of metadata column names. Empty if the asset table
        carries no extra columns.

    Raises:
        DerivaMLTableTypeError: If ``table`` is not an asset table.
        DerivaMLTableNotFound: If ``table`` doesn't exist (raised by
            :meth:`name_to_table`).

    Example:
        >>> sorted(model.asset_metadata("Image"))  # doctest: +SKIP
        ['Description', 'Image_Class']
    """
    table = self.name_to_table(table)

    if not self.is_asset(table):
        raise DerivaMLTableTypeError("asset table", table.name)
    return {c.name for c in table.columns} - DerivaAssetColumns

asset_metadata_columns

asset_metadata_columns(
    table: TableInput,
) -> list[Column]

Return Column objects for the asset-metadata columns of table.

Like :meth:asset_metadata but returns the :class:Column instances (not just names) so callers can inspect attributes such as nullok. Results are sorted by column name for deterministic iteration.

Parameters:

Name Type Description Default
table TableInput

Asset table name or Table object.

required

Returns:

Type Description
list[Column]

Sorted list of Column objects.

Raises:

Type Description
DerivaMLTableTypeError

If table is not an asset table.

Example

[c.name for c in model.asset_metadata_columns("Image")] # doctest: +SKIP ['Description', 'Image_Class']

Source code in src/deriva_ml/model/catalog.py
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
def asset_metadata_columns(self, table: TableInput) -> list[Column]:
    """Return Column objects for the asset-metadata columns of ``table``.

    Like :meth:`asset_metadata` but returns the :class:`Column`
    instances (not just names) so callers can inspect attributes
    such as ``nullok``. Results are sorted by column name for
    deterministic iteration.

    Args:
        table: Asset table name or Table object.

    Returns:
        Sorted list of Column objects.

    Raises:
        DerivaMLTableTypeError: If ``table`` is not an asset table.

    Example:
        >>> [c.name for c in model.asset_metadata_columns("Image")]  # doctest: +SKIP
        ['Description', 'Image_Class']
    """
    table = self.name_to_table(table)
    if not self.is_asset(table):
        raise DerivaMLTableTypeError("asset table", table.name)
    return sorted(
        (c for c in table.columns if c.name not in DerivaAssetColumns),
        key=lambda c: c.name,
    )

asset_metadata_sorted

asset_metadata_sorted(
    table: TableInput,
) -> list[str]

Return the asset-metadata column names in deterministic order.

Sorted by name. Pins the alphabetic-order invariant in one place so call sites stay in lockstep:

  • :func:~deriva_ml.core.upload_layout.asset_table_upload_spec builds the upload regex from these names; the directory order in the staging tree must match the regex order.
  • :func:~deriva_ml.execution.bag_commit._add_asset_rows_to_bag emits metadata columns into the bag in the same order so the recorded rows align with the upload regex captures.

Pre-extraction, each call site re-wrote sorted(model.asset_metadata(table)) inline. Centralising the call shape means a future change to the ordering rule (e.g. case-insensitive sort, or sorted by FK target) lands once and everyone follows.

Parameters:

Name Type Description Default
table TableInput

Asset table name or :class:Table instance.

required

Returns:

Type Description
list[str]

Sorted list of metadata column names. Empty list if

list[str]

the table carries no extra columns.

Raises:

Type Description
DerivaMLTableTypeError

If table isn't an asset table.

Example

from deriva_ml.model.catalog import DerivaModel # doctest: +SKIP model.asset_metadata_sorted("Image") # doctest: +SKIP ['Asset_Role', 'Description']

Source code in src/deriva_ml/model/catalog.py
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
def asset_metadata_sorted(self, table: TableInput) -> list[str]:
    """Return the asset-metadata column **names** in deterministic order.

    Sorted by name. Pins the alphabetic-order invariant in one
    place so call sites stay in lockstep:

    - :func:`~deriva_ml.core.upload_layout.asset_table_upload_spec`
      builds the upload regex from these names; the directory
      order in the staging tree must match the regex order.
    - :func:`~deriva_ml.execution.bag_commit._add_asset_rows_to_bag`
      emits metadata columns into the bag in the same order so
      the recorded rows align with the upload regex captures.

    Pre-extraction, each call site re-wrote
    ``sorted(model.asset_metadata(table))`` inline. Centralising
    the call shape means a future change to the ordering rule
    (e.g. case-insensitive sort, or sorted by FK target) lands
    once and everyone follows.

    Args:
        table: Asset table name or :class:`Table` instance.

    Returns:
        Sorted list of metadata column names. Empty list if
        the table carries no extra columns.

    Raises:
        DerivaMLTableTypeError: If ``table`` isn't an asset table.

    Example:
        >>> from deriva_ml.model.catalog import DerivaModel  # doctest: +SKIP
        >>> model.asset_metadata_sorted("Image")  # doctest: +SKIP
        ['Asset_Role', 'Description']
    """
    return sorted(self.asset_metadata(table))

create_table

create_table(
    table_def: TableDefinition,
    schema: str | None = None,
) -> Table

Create a new table from TableDefinition.

Parameters:

Name Type Description Default
table_def TableDefinition

Table definition (dataclass or dict).

required
schema str | None

Schema to create the table in. If None, uses default_schema.

None

Returns:

Type Description
Table

The newly created Table.

Raises:

Type Description
DerivaMLException

If no schema specified and default_schema is not set.

Note: @validate_call removed because TableDefinition is now a dataclass from deriva.core.typed and Pydantic validation doesn't work well with dataclass fields.

Example

from deriva_ml.core.definitions import TableDefinition, ColumnDefinition # doctest: +SKIP table_def = TableDefinition( # doctest: +SKIP ... name="Observation", ... column_defs=[ColumnDefinition(name="Note", type="text")], ... ) new_table = model.create_table(table_def, schema="my_domain") # doctest: +SKIP

Source code in src/deriva_ml/model/catalog.py
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
def create_table(self, table_def: TableDefinition, schema: str | None = None) -> Table:
    """Create a new table from TableDefinition.

    Args:
        table_def: Table definition (dataclass or dict).
        schema: Schema to create the table in. If None, uses default_schema.

    Returns:
        The newly created Table.

    Raises:
        DerivaMLException: If no schema specified and default_schema is not set.

    Note: @validate_call removed because TableDefinition is now a dataclass from
    deriva.core.typed and Pydantic validation doesn't work well with dataclass fields.

    Example:
        >>> from deriva_ml.core.definitions import TableDefinition, ColumnDefinition  # doctest: +SKIP
        >>> table_def = TableDefinition(  # doctest: +SKIP
        ...     name="Observation",
        ...     column_defs=[ColumnDefinition(name="Note", type="text")],
        ... )
        >>> new_table = model.create_table(table_def, schema="my_domain")  # doctest: +SKIP
    """
    schema = schema or self._require_default_schema()
    # Handle both TableDefinition (dataclass with to_dict) and plain dicts
    table_dict = table_def.to_dict() if hasattr(table_def, "to_dict") else table_def
    return self.model.schemas[schema].create_table(table_dict)

find_asset_execution_tables

find_asset_execution_tables() -> (
    list[tuple[str, str]]
)

Return the *_Execution association tables across all schemas.

Walks every domain + ML schema once, finds tables whose name ends with _Execution, and caches the result on the instance. Subsequent calls re-use the cache so callers that walk these tables repeatedly (e.g. :func:~deriva_ml.execution._helpers.list_assets) pay the schema-iteration cost exactly once per :class:DerivaModel lifetime.

Two *_Execution tables are excluded because they're not asset-to-execution association tables despite the suffix:

  • Dataset_Execution — dataset linkage; consumed by :func:~deriva_ml.execution._helpers.list_input_datasets.
  • Execution_Execution — nested-execution hierarchy (parent/child); has no Asset_Role column. Hitting it during the list_assets(asset_role=...) walk produces an AttributeError ("no such column Asset_Role") rather than just returning zero matches, so the exclusion is correctness-critical.

The cache is invalidated whenever the underlying model object identity changes (e.g. after a catalog Model.fromcatalog refetch). In practice the model is only refreshed in long-lived sessions that mutate schema — the common case (read-mostly scripts) hits the cache on every call after the first.

Returns:

Type Description
list[tuple[str, str]]

List of (schema_name, table_name) pairs, ordered

list[tuple[str, str]]

by schema-then-table for deterministic iteration.

Example

from deriva_ml.model.catalog import DerivaModel # doctest: +SKIP model.find_asset_execution_tables() # doctest: +SKIP [('deriva-ml', 'Execution_Asset_Execution'), ('test_schema', 'Image_Execution')]

Source code in src/deriva_ml/model/catalog.py
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
def find_asset_execution_tables(self) -> list[tuple[str, str]]:
    """Return the ``*_Execution`` association tables across all schemas.

    Walks every domain + ML schema once, finds tables whose
    name ends with ``_Execution``, and caches the result on
    the instance. Subsequent calls re-use the cache so callers
    that walk these tables repeatedly (e.g.
    :func:`~deriva_ml.execution._helpers.list_assets`) pay
    the schema-iteration cost exactly once per
    :class:`DerivaModel` lifetime.

    Two ``*_Execution`` tables are **excluded** because they're
    not asset-to-execution association tables despite the
    suffix:

    - ``Dataset_Execution`` — dataset linkage; consumed by
      :func:`~deriva_ml.execution._helpers.list_input_datasets`.
    - ``Execution_Execution`` — nested-execution hierarchy
      (parent/child); has no ``Asset_Role`` column. Hitting
      it during the ``list_assets(asset_role=...)`` walk
      produces an ``AttributeError`` ("no such column
      ``Asset_Role``") rather than just returning zero
      matches, so the exclusion is correctness-critical.

    The cache is invalidated whenever the underlying model
    object identity changes (e.g. after a catalog
    ``Model.fromcatalog`` refetch). In practice the model is
    only refreshed in long-lived sessions that mutate schema
    — the common case (read-mostly scripts) hits the cache
    on every call after the first.

    Returns:
        List of ``(schema_name, table_name)`` pairs, ordered
        by schema-then-table for deterministic iteration.

    Example:
        >>> from deriva_ml.model.catalog import DerivaModel  # doctest: +SKIP
        >>> model.find_asset_execution_tables()  # doctest: +SKIP
        [('deriva-ml', 'Execution_Asset_Execution'),
         ('test_schema', 'Image_Execution')]
    """
    # Cache key is the underlying model object's identity —
    # a refetch swaps it out and we recompute. Cheap, safe.
    cached = getattr(self, "_asset_execution_tables_cache", None)
    if cached is not None and cached[0] is self.model:
        return cached[1]

    result: list[tuple[str, str]] = []
    schemas_to_search = [*sorted(self.domain_schemas), self.ml_schema]
    for schema_name in schemas_to_search:
        schema_obj = self.model.schemas.get(schema_name)
        if schema_obj is None:
            continue
        for table in schema_obj.tables.values():
            if not table.name.endswith("_Execution"):
                continue
            # ``Dataset_Execution`` is the dataset linkage; not an
            # asset association. ``Execution_Execution`` is the
            # nested-execution parent/child table; has no
            # ``Asset_Role`` column so it'd crash any
            # ``list_assets(asset_role=...)`` walk.
            if table.name in ("Dataset_Execution", "Execution_Execution"):
                continue
            result.append((schema_name, table.name))

    self._asset_execution_tables_cache = (self.model, result)
    return result

find_assets

find_assets() -> list[Table]

Return the list of asset tables in the current model.

Returns:

Type Description
list[Table]

All tables across every schema that satisfy :meth:is_asset.

Example

[t.name for t in model.find_assets()] # doctest: +SKIP ['Image', 'Execution_Asset']

Source code in src/deriva_ml/model/catalog.py
709
710
711
712
713
714
715
716
717
718
719
def find_assets(self) -> list[Table]:
    """Return the list of asset tables in the current model.

    Returns:
        All tables across every schema that satisfy :meth:`is_asset`.

    Example:
        >>> [t.name for t in model.find_assets()]  # doctest: +SKIP
        ['Image', 'Execution_Asset']
    """
    return [t for s in self.model.schemas.values() for t in s.tables.values() if self.is_asset(t)]

find_association

find_association(
    table1: TableInput,
    table2: TableInput,
) -> tuple[Table, str, str]

Return the unique association table linking table1 and table2.

Searches all associations on table1 for one whose other-side FK lands on table2. The result lets callers JOIN through the link without re-deriving the column names by hand.

Parameters:

Name Type Description Default
table1 TableInput

Either endpoint of the association. Table name or :class:Table.

required
table2 TableInput

The other endpoint. Table name or :class:Table.

required

Returns:

Type Description
Table

(assoc_table, table1_link_column, table2_link_column)

str

— the association :class:Table itself plus the names (as

str

str) of the two FK columns on it (one referencing

tuple[Table, str, str]

table1, one referencing table2). The column names are

tuple[Table, str, str]

returned as strings because every caller uses them directly

tuple[Table, str, str]

as datapath .columns[...] keys or insert-row dict keys.

Raises:

Type Description
NoAssociationException

If no association table connects the two tables. Callers that legitimately handle the "no link" case (e.g. probing whether an asset table is tracked through Execution) should catch this specific subclass rather than the broader :class:DerivaMLException.

AmbiguousAssociationException

If multiple association tables connect the two tables. The caller must disambiguate by naming the desired association table directly.

Example

assoc, c1, c2 = model.find_association("Dataset", "Image") # doctest: +SKIP assoc.name, c1, c2 # doctest: +SKIP ('Dataset_Image', 'Dataset', 'Image')

Source code in src/deriva_ml/model/catalog.py
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
def find_association(self, table1: TableInput, table2: TableInput) -> tuple[Table, str, str]:
    """Return the unique association table linking ``table1`` and ``table2``.

    Searches all associations on ``table1`` for one whose other-side
    FK lands on ``table2``. The result lets callers JOIN through the
    link without re-deriving the column names by hand.

    Args:
        table1: Either endpoint of the association. Table name or
            :class:`Table`.
        table2: The other endpoint. Table name or :class:`Table`.

    Returns:
        ``(assoc_table, table1_link_column, table2_link_column)``
        — the association :class:`Table` itself plus the *names* (as
        ``str``) of the two FK columns on it (one referencing
        ``table1``, one referencing ``table2``). The column names are
        returned as strings because every caller uses them directly
        as ``datapath`` ``.columns[...]`` keys or insert-row dict keys.

    Raises:
        NoAssociationException: If no association table connects the
            two tables. Callers that legitimately handle the "no link"
            case (e.g. probing whether an asset table is tracked
            through ``Execution``) should catch this specific subclass
            rather than the broader :class:`DerivaMLException`.
        AmbiguousAssociationException: If multiple association tables
            connect the two tables. The caller must disambiguate by
            naming the desired association table directly.

    Example:
        >>> assoc, c1, c2 = model.find_association("Dataset", "Image")  # doctest: +SKIP
        >>> assoc.name, c1, c2  # doctest: +SKIP
        ('Dataset_Image', 'Dataset', 'Image')
    """
    table1 = self.name_to_table(table1)
    table2 = self.name_to_table(table2)

    tables = [
        (a.table, a.self_fkey.columns[0].name, other_key.columns[0].name)
        for a in table1.find_associations(pure=False)
        if len(a.other_fkeys) == 1 and (other_key := a.other_fkeys.pop()).pk_table == table2
    ]

    if len(tables) == 1:
        return tables[0]
    elif len(tables) == 0:
        raise NoAssociationException(table1.name, table2.name)
    else:
        raise AmbiguousAssociationException(table1.name, table2.name, len(tables))

find_features

find_features(
    table: TableInput | None = None,
) -> Iterable[Feature]

List features in the catalog.

If a table is specified, returns only features for that table. If no table is specified, returns all features across all tables in the catalog.

Parameters:

Name Type Description Default
table TableInput | None

Optional table to find features for. If None, returns all features in the catalog.

None

Returns:

Type Description
Iterable[Feature]

An iterable of Feature instances describing the features.

Example

[f.feature_name for f in model.find_features("Image")] # doctest: +SKIP ['BoundingBox', 'Quality'] all_features = list(model.find_features()) # doctest: +SKIP

Source code in src/deriva_ml/model/catalog.py
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
@validate_call(config=VALIDATION_CONFIG)
def find_features(self, table: TableInput | None = None) -> Iterable[Feature]:
    """List features in the catalog.

    If a table is specified, returns only features for that table.
    If no table is specified, returns all features across all tables in the catalog.

    Args:
        table: Optional table to find features for. If None, returns all features
            in the catalog.

    Returns:
        An iterable of Feature instances describing the features.

    Example:
        >>> [f.feature_name for f in model.find_features("Image")]  # doctest: +SKIP
        ['BoundingBox', 'Quality']
        >>> all_features = list(model.find_features())  # doctest: +SKIP
    """

    def is_feature(a: FindAssociationResult) -> bool:
        """Check if association represents a feature.

        Args:
            a: Association result to check
        Returns:
            bool: True if association represents a feature
        """
        return {
            "Feature_Name",
            "Execution",
            a.self_fkey.foreign_key_columns[0].name,
        }.issubset({c.name for c in a.table.columns})

    def find_table_features(t: Table) -> list[Feature]:
        """Find all features for a single table.

        ``max_arity`` is left unbounded (``None``) so that
        *key-qualified* multi-value features are discovered. A
        qualifier is a value FK that participates in the
        association table's compound uniqueness key — e.g.
        ``Image_Side`` on eye-ai's ``Execution_Subject_Chart_Label``,
        where the same Subject legitimately has a left-eye and a
        right-eye row. Such a key includes
        ``{Execution, Subject, Feature_Name, Image_Side}``, giving a
        key-FK arity of 4. The former ``max_arity=3`` cap silently
        excluded these features from discovery (and therefore from
        ``lookup_feature`` / ``feature_values``).

        ``is_feature`` remains the sole filter: it still requires the
        ``Feature_Name`` and ``Execution`` FKs plus the target FK, so
        removing the ceiling cannot admit a non-feature association
        (a plain N-way domain join lacks ``Feature_Name``).
        ``min_arity=3`` is retained as the lower bound.
        """
        return [
            Feature(a, self) for a in t.find_associations(min_arity=3, max_arity=None, pure=False) if is_feature(a)
        ]

    if table is not None:
        # Find features for a specific table
        return find_table_features(self.name_to_table(table))

    # No table arg: discover features across the whole catalog.
    #
    # ``find_associations`` walks ``Table.referenced_by`` from each
    # candidate table, so the same association table is visited
    # once per FK target. For a single ``Image.Image_Classification``
    # feature backed by ``Execution_Image_Image_Classification``
    # (an association with FKs to Image, Execution, and the
    # Image_Class vocab) the naive cross-schema scan yields three
    # Feature objects -- one with ``target_table=Image`` (the
    # actual target), one with ``target_table=Execution``, and one
    # with ``target_table=Image_Class``. Only the first is what
    # callers want. See
    # docs/bugs/2026-05-19-find-features-duplicates.md.
    #
    # The fix is twofold:
    # 1. Skip iteration over tables that can never be the actual
    #    feature target -- the Execution table and any vocabulary
    #    table. Every feature association references both, so
    #    scanning them only produces duplicates.
    # 2. Dedup the remaining list by the association table itself
    #    (qualified schema.name), in case multiple distinct target
    #    tables share an association in some non-canonical layout.
    ml_schema_obj = self.model.schemas.get(self.ml_schema)
    execution_table = ml_schema_obj.tables.get("Execution") if ml_schema_obj is not None else None

    seen_feature_tables: set[tuple[str, str]] = set()
    features: list[Feature] = []
    for schema_name in [*self.domain_schemas, self.ml_schema]:
        schema = self.model.schemas.get(schema_name)
        if schema is None:
            continue
        for t in schema.tables.values():
            if execution_table is not None and t is execution_table:
                continue
            if self.is_vocabulary(t):
                continue
            for f in find_table_features(t):
                key = (f.feature_table.schema.name, f.feature_table.name)
                if key in seen_feature_tables:
                    continue
                seen_feature_tables.add(key)
                features.append(f)
    return features

find_vocabularies

find_vocabularies() -> list[Table]

Return a list of all controlled vocabulary tables in domain and ML schemas.

Returns:

Type Description
list[Table]

All tables in the domain and ML schemas that satisfy

list[Table]

meth:is_vocabulary.

Example

[t.name for t in model.find_vocabularies()] # doctest: +SKIP ['Image_Class', 'Workflow_Type']

Source code in src/deriva_ml/model/catalog.py
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
def find_vocabularies(self) -> list[Table]:
    """Return a list of all controlled vocabulary tables in domain and ML schemas.

    Returns:
        All tables in the domain and ML schemas that satisfy
        :meth:`is_vocabulary`.

    Example:
        >>> [t.name for t in model.find_vocabularies()]  # doctest: +SKIP
        ['Image_Class', 'Workflow_Type']
    """
    tables = []
    for schema_name in [*self.domain_schemas, self.ml_schema]:
        schema = self.model.schemas.get(schema_name)
        if schema:
            tables.extend(t for t in schema.tables.values() if self.is_vocabulary(t))
    return tables

from_cached classmethod

from_cached(
    schema_dict: dict,
    *,
    catalog,
    ml_schema: str = ML_SCHEMA,
    domain_schemas: "str | set[str] | None" = None,
    default_schema: "str | None" = None,
) -> "DerivaModel"

Construct a DerivaModel from a cached ermrest /schema dict.

No network is touched. The catalog argument is passed to deriva-py's Model(catalog, model_doc) constructor as the first positional argument; in offline mode it will be a :class:~deriva_ml.core.catalog_stub.CatalogStub, in online mode it is a real ErmrestCatalog. DerivaModel.__init__ then reads the catalog back off model.catalog as usual.

This replicates what Model.fromcatalog(catalog) does online — the online call fetches the schema dict via catalog.getCatalogSchema() (cached and ETag-revalidated by deriva-py) and passes the result to Model(catalog, dict). Here we pass in the already-cached dict from :class:~deriva_ml.core.schema_cache.SchemaCache.

Parameters:

Name Type Description Default
schema_dict dict

The JSON payload from a previous catalog.getCatalogSchema() call (or any equivalent /schema GET), as persisted by SchemaCache.

required
catalog

The catalog object to associate with the model. Pass a real ErmrestCatalog online, or a CatalogStub offline.

required
ml_schema str

ML schema name (default "deriva-ml").

ML_SCHEMA
domain_schemas 'str | set[str] | None'

Optional explicit set of domain schema names. If None, auto-detects all non-system schemas from the cached dict.

None
default_schema 'str | None'

Optional default schema name.

None

Returns:

Type Description
'DerivaModel'

A DerivaModel wrapping a deriva-py Model

'DerivaModel'

reconstructed from the dict.

Example

cached = schema_cache.load(hostname, catalog_id) # doctest: +SKIP model = DerivaModel.from_cached( # doctest: +SKIP ... cached, catalog=catalog_stub, ml_schema="deriva-ml" ... )

Source code in src/deriva_ml/model/catalog.py
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
@classmethod
def from_cached(
    cls,
    schema_dict: dict,
    *,
    catalog,
    ml_schema: str = ML_SCHEMA,
    domain_schemas: "str | set[str] | None" = None,
    default_schema: "str | None" = None,
) -> "DerivaModel":
    """Construct a DerivaModel from a cached ermrest /schema dict.

    No network is touched. The ``catalog`` argument is passed to
    deriva-py's ``Model(catalog, model_doc)`` constructor as the
    first positional argument; in offline mode it will be a
    :class:`~deriva_ml.core.catalog_stub.CatalogStub`, in online
    mode it is a real ``ErmrestCatalog``. ``DerivaModel.__init__``
    then reads the catalog back off ``model.catalog`` as usual.

    This replicates what ``Model.fromcatalog(catalog)`` does
    online — the online call fetches the schema dict via
    ``catalog.getCatalogSchema()`` (cached and ETag-revalidated
    by deriva-py) and passes the result to ``Model(catalog, dict)``.
    Here we pass in the already-cached dict from
    :class:`~deriva_ml.core.schema_cache.SchemaCache`.

    Args:
        schema_dict: The JSON payload from a previous
            ``catalog.getCatalogSchema()`` call (or any equivalent
            ``/schema`` GET), as persisted by ``SchemaCache``.
        catalog: The catalog object to associate with the model.
            Pass a real ``ErmrestCatalog`` online, or a
            ``CatalogStub`` offline.
        ml_schema: ML schema name (default ``"deriva-ml"``).
        domain_schemas: Optional explicit set of domain schema
            names. If None, auto-detects all non-system schemas
            from the cached dict.
        default_schema: Optional default schema name.

    Returns:
        A ``DerivaModel`` wrapping a deriva-py ``Model``
        reconstructed from the dict.

    Example:
        >>> cached = schema_cache.load(hostname, catalog_id)  # doctest: +SKIP
        >>> model = DerivaModel.from_cached(  # doctest: +SKIP
        ...     cached, catalog=catalog_stub, ml_schema="deriva-ml"
        ... )
    """
    # Model.__init__(catalog, model_doc) stores catalog as
    # self._catalog and exposes it via the .catalog property;
    # DerivaModel.__init__ then reads self.model.catalog.
    model = Model(catalog, schema_dict)
    return cls(
        model,
        ml_schema=ml_schema,
        domain_schemas=domain_schemas,
        default_schema=default_schema,
    )

get_schema_description

get_schema_description(
    include_system_columns: bool = False,
) -> dict[str, Any]

Return a JSON description of the catalog schema structure.

Provides a structured representation of the domain and ML schemas including tables, columns, foreign keys, and relationships. Useful for understanding the data model structure programmatically.

Parameters:

Name Type Description Default
include_system_columns bool

If True, include RID, RCT, RMT, RCB, RMB columns. Default False to reduce output size.

False

Returns:

Type Description
dict[str, Any]

Dictionary with schema structure:

dict[str, Any]

{ "domain_schemas": ["schema_name1", "schema_name2"], "default_schema": "schema_name1", "ml_schema": "deriva-ml", "schemas": { "schema_name": { "tables": { "TableName": { "comment": "description", "is_vocabulary": bool, "is_asset": bool, "is_association": bool, "columns": [...], "foreign_keys": [...], "features": [...] } } } }

dict[str, Any]

}

Example

desc = model.get_schema_description() # doctest: +SKIP sorted(desc["schemas"]) # doctest: +SKIP ['deriva-ml', 'my_domain'] desc["schemas"]["my_domain"]["tables"]["Image"]["is_asset"] # doctest: +SKIP True

Source code in src/deriva_ml/model/catalog.py
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
def get_schema_description(self, include_system_columns: bool = False) -> dict[str, Any]:
    """Return a JSON description of the catalog schema structure.

    Provides a structured representation of the domain and ML schemas including
    tables, columns, foreign keys, and relationships. Useful for understanding
    the data model structure programmatically.

    Args:
        include_system_columns: If True, include RID, RCT, RMT, RCB, RMB columns.
            Default False to reduce output size.

    Returns:
        Dictionary with schema structure:
        {
            "domain_schemas": ["schema_name1", "schema_name2"],
            "default_schema": "schema_name1",
            "ml_schema": "deriva-ml",
            "schemas": {
                "schema_name": {
                    "tables": {
                        "TableName": {
                            "comment": "description",
                            "is_vocabulary": bool,
                            "is_asset": bool,
                            "is_association": bool,
                            "columns": [...],
                            "foreign_keys": [...],
                            "features": [...]
                        }
                    }
                }
            }
        }

    Example:
        >>> desc = model.get_schema_description()  # doctest: +SKIP
        >>> sorted(desc["schemas"])  # doctest: +SKIP
        ['deriva-ml', 'my_domain']
        >>> desc["schemas"]["my_domain"]["tables"]["Image"]["is_asset"]  # doctest: +SKIP
        True
    """
    system_columns = {"RID", "RCT", "RMT", "RCB", "RMB"}
    result = {
        "domain_schemas": sorted(self.domain_schemas),
        "default_schema": self.default_schema,
        "ml_schema": self.ml_schema,
        "schemas": {},
    }

    # Include all domain schemas and the ML schema
    for schema_name in [*self.domain_schemas, self.ml_schema]:
        schema = self.model.schemas.get(schema_name)
        if not schema:
            continue

        schema_info = {"tables": {}}

        for table_name, table in schema.tables.items():
            # Get columns
            columns = []
            for col in table.columns:
                if not include_system_columns and col.name in system_columns:
                    continue
                columns.append(
                    {
                        "name": col.name,
                        "type": str(col.type.typename),
                        "nullok": col.nullok,
                        "comment": col.comment or "",
                    }
                )

            # Get foreign keys
            foreign_keys = []
            for fk in table.foreign_keys:
                fk_cols = [c.name for c in fk.foreign_key_columns]
                ref_cols = [c.name for c in fk.referenced_columns]
                foreign_keys.append(
                    {
                        "columns": fk_cols,
                        "referenced_table": f"{fk.pk_table.schema.name}.{fk.pk_table.name}",
                        "referenced_columns": ref_cols,
                    }
                )

            # Get features if this is a domain table
            features = []
            if self.is_domain_schema(schema_name):
                try:
                    for f in self.find_features(table):
                        features.append(
                            {
                                "name": f.feature_name,
                                "feature_table": f.feature_table.name,
                            }
                        )
                except Exception as e:
                    logger.debug(f"Could not enumerate features for table {table.name}: {e}")

            table_info = {
                "comment": table.comment or "",
                "is_vocabulary": self.is_vocabulary(table),
                "is_asset": self.is_asset(table),
                "is_association": bool(self.is_association(table)),
                "columns": columns,
                "foreign_keys": foreign_keys,
            }
            if features:
                table_info["features"] = features

            schema_info["tables"][table_name] = table_info

        result["schemas"][schema_name] = schema_info

    return result

is_asset

is_asset(table: TableInput) -> bool

Check whether table is a proper asset table.

Delegates to :meth:Table.is_asset from deriva-py, which verifies:

  • Required columns exist (URL, Filename, Length, MD5).
  • URL, Length, MD5 are NOT NULL.
  • URL carries the asset annotation.

Parameters:

Name Type Description Default
table TableInput

Table name or :class:Table to inspect.

required

Returns:

Type Description
bool

True if all asset-table requirements are satisfied.

Raises:

Type Description
DerivaMLTableNotFound

If table doesn't exist in any searchable schema (raised by :meth:name_to_table).

Example

model.is_asset("Image") # doctest: +SKIP True model.is_asset("Subject") # doctest: +SKIP False

Source code in src/deriva_ml/model/catalog.py
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
def is_asset(self, table: TableInput) -> bool:
    """Check whether ``table`` is a proper asset table.

    Delegates to :meth:`Table.is_asset` from deriva-py, which verifies:

    - Required columns exist (``URL``, ``Filename``, ``Length``, ``MD5``).
    - ``URL``, ``Length``, ``MD5`` are NOT NULL.
    - ``URL`` carries the ``asset`` annotation.

    Args:
        table: Table name or :class:`Table` to inspect.

    Returns:
        True if all asset-table requirements are satisfied.

    Raises:
        DerivaMLTableNotFound: If ``table`` doesn't exist in any searchable
            schema (raised by :meth:`name_to_table`).

    Example:
        >>> model.is_asset("Image")  # doctest: +SKIP
        True
        >>> model.is_asset("Subject")  # doctest: +SKIP
        False
    """
    table = self.name_to_table(table)
    return table.is_asset()

is_association

is_association(
    table: TableInput,
    unqualified: bool = True,
    pure: bool = True,
    min_arity: int = 2,
    max_arity: int = 2,
) -> bool | set[str] | int

Check whether table is an association (linking) table.

Delegates to :meth:deriva.core.ermrest_model.Table.is_association. An association table mediates a many-to-many relationship between two (or more) tables via outbound FKs to each end.

Parameters:

Name Type Description Default
table TableInput

Table name or :class:Table to inspect.

required
unqualified bool

Per deriva-py — if True, the returned column set uses bare column names (no schema/table qualification). Only consulted when the return mode is the column-name set.

True
pure bool

If True, require a pure association — no extra payload columns beyond the FK columns and system metadata (RID, RCT, RMT, RCB, RMB). Excludes feature tables, which carry their own non-FK columns.

True
min_arity int

Minimum number of outbound FKs that count as "associating." Defaults to 2 (a binary association).

2
max_arity int

Maximum number of outbound FKs. Defaults to 2.

2

Returns:

Name Type Description
bool | set[str] | int

bool when the question is "is this any association at the

bool | set[str] | int

requested arity," or set[str] / int when deriva-py's

bool | set[str] | int

is_association returns the structural detail set instead.

See bool | set[str] | int

meth:Table.is_association for the full contract.

Raises:

Type Description
DerivaMLTableNotFound

If table doesn't exist in any searchable schema (raised by :meth:name_to_table).

Example

bool(model.is_association("Dataset_Image")) # doctest: +SKIP True bool(model.is_association("Image")) # doctest: +SKIP False

Source code in src/deriva_ml/model/catalog.py
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
def is_association(
    self,
    table: TableInput,
    unqualified: bool = True,
    pure: bool = True,
    min_arity: int = 2,
    max_arity: int = 2,
) -> bool | set[str] | int:
    """Check whether ``table`` is an association (linking) table.

    Delegates to :meth:`deriva.core.ermrest_model.Table.is_association`.
    An association table mediates a many-to-many relationship between
    two (or more) tables via outbound FKs to each end.

    Args:
        table: Table name or :class:`Table` to inspect.
        unqualified: Per deriva-py — if True, the returned column set
            uses bare column names (no schema/table qualification).
            Only consulted when the return mode is the column-name set.
        pure: If True, require a *pure* association — no extra payload
            columns beyond the FK columns and system metadata (RID,
            RCT, RMT, RCB, RMB). Excludes feature tables, which carry
            their own non-FK columns.
        min_arity: Minimum number of outbound FKs that count as
            "associating." Defaults to 2 (a binary association).
        max_arity: Maximum number of outbound FKs. Defaults to 2.

    Returns:
        ``bool`` when the question is "is this *any* association at the
        requested arity," or ``set[str]`` / ``int`` when deriva-py's
        ``is_association`` returns the structural detail set instead.
        See :meth:`Table.is_association` for the full contract.

    Raises:
        DerivaMLTableNotFound: If ``table`` doesn't exist in any searchable
            schema (raised by :meth:`name_to_table`).

    Example:
        >>> bool(model.is_association("Dataset_Image"))  # doctest: +SKIP
        True
        >>> bool(model.is_association("Image"))  # doctest: +SKIP
        False
    """
    table = self.name_to_table(table)
    return table.is_association(unqualified=unqualified, pure=pure, min_arity=min_arity, max_arity=max_arity)

is_dataset_rid

is_dataset_rid(
    rid: RID, deleted: bool = False
) -> bool

Check whether rid identifies a (non-deleted) Dataset row.

Resolves rid against the live catalog via :meth:ErmrestCatalog.resolve_rid to determine which table it belongs to, then verifies it's the Dataset table. By default deleted datasets are treated as not-a-dataset; pass deleted=True to include tombstoned rows in the positive set.

Parameters:

Name Type Description Default
rid RID

The RID to test.

required
deleted bool

If True, return True for soft-deleted datasets too. Defaults to False (deleted rows return False).

False

Returns:

Type Description
bool

True if rid is a Dataset row (filtered by the deleted

bool

flag), False if it points at a different table.

Raises:

Type Description
DerivaMLException

If rid doesn't resolve in the catalog at all (typically an invalid or fabricated RID).

Example

model.is_dataset_rid("1-abc123") # doctest: +SKIP True model.is_dataset_rid("1-image01") # an Image RID # doctest: +SKIP False

Source code in src/deriva_ml/model/catalog.py
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
def is_dataset_rid(self, rid: RID, deleted: bool = False) -> bool:
    """Check whether ``rid`` identifies a (non-deleted) Dataset row.

    Resolves ``rid`` against the live catalog via
    :meth:`ErmrestCatalog.resolve_rid` to determine which table it
    belongs to, then verifies it's the ``Dataset`` table. By default
    deleted datasets are treated as not-a-dataset; pass ``deleted=True``
    to include tombstoned rows in the positive set.

    Args:
        rid: The RID to test.
        deleted: If True, return ``True`` for soft-deleted datasets
            too. Defaults to False (deleted rows return ``False``).

    Returns:
        True if ``rid`` is a Dataset row (filtered by the ``deleted``
        flag), False if it points at a different table.

    Raises:
        DerivaMLException: If ``rid`` doesn't resolve in the catalog
            at all (typically an invalid or fabricated RID).

    Example:
        >>> model.is_dataset_rid("1-abc123")  # doctest: +SKIP
        True
        >>> model.is_dataset_rid("1-image01")  # an Image RID  # doctest: +SKIP
        False
    """
    try:
        rid_info = self.model.catalog.resolve_rid(rid, self.model)
    except KeyError as _e:
        raise DerivaMLException(f"Invalid RID {rid}")
    if rid_info.table.name != "Dataset":
        return False
    elif deleted:
        # Got a dataset rid. Now check to see if its deleted or not.
        return True
    else:
        return not list(rid_info.datapath.entities().fetch())[0]["Deleted"]

is_domain_schema

is_domain_schema(
    schema_name: str,
) -> bool

Check if a schema is a domain schema.

Parameters:

Name Type Description Default
schema_name str

Name of the schema to check.

required

Returns:

Type Description
bool

True if the schema is a domain schema.

Example

model.is_domain_schema("my_domain") # doctest: +SKIP True model.is_domain_schema("deriva-ml") # doctest: +SKIP False

Source code in src/deriva_ml/model/catalog.py
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
def is_domain_schema(self, schema_name: str) -> bool:
    """Check if a schema is a domain schema.

    Args:
        schema_name: Name of the schema to check.

    Returns:
        True if the schema is a domain schema.

    Example:
        >>> model.is_domain_schema("my_domain")  # doctest: +SKIP
        True
        >>> model.is_domain_schema("deriva-ml")  # doctest: +SKIP
        False
    """
    return schema_name in self.domain_schemas

is_system_schema

is_system_schema(
    schema_name: str,
) -> bool

Check if a schema is a system or ML schema.

Parameters:

Name Type Description Default
schema_name str

Name of the schema to check.

required

Returns:

Type Description
bool

True if the schema is a system or ML schema.

Example

model.is_system_schema("public") # doctest: +SKIP True model.is_system_schema("my_domain") # doctest: +SKIP False

Source code in src/deriva_ml/model/catalog.py
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
def is_system_schema(self, schema_name: str) -> bool:
    """Check if a schema is a system or ML schema.

    Args:
        schema_name: Name of the schema to check.

    Returns:
        True if the schema is a system or ML schema.

    Example:
        >>> model.is_system_schema("public")  # doctest: +SKIP
        True
        >>> model.is_system_schema("my_domain")  # doctest: +SKIP
        False
    """
    return _is_system_schema(schema_name, self.ml_schema)

is_vocabulary

is_vocabulary(
    table: TableInput,
) -> bool

Check if a given table is a controlled vocabulary table.

Delegates to Table.is_vocabulary() in deriva-py, which enforces both the required column names AND their types (ermrest_curie, ermrest_uri, text, markdown). The type check is stricter than a column-name-only check — a table with an ID column of the wrong type correctly returns False here where the legacy name-only implementation would have returned True.

Mirrors :meth:is_asset, which already delegates to Table.is_asset().

Parameters:

Name Type Description Default
table TableInput

An ERMrest Table object or the name of the table.

required

Returns:

Type Description
bool

True if the table has the structure of a controlled vocabulary,

bool

False otherwise.

Raises:

Type Description
DerivaMLTableNotFound

If the table doesn't exist in any searchable schema (raised by :meth:name_to_table).

Example

model.is_vocabulary("Image_Class") # doctest: +SKIP True model.is_vocabulary("Image") # doctest: +SKIP False

Source code in src/deriva_ml/model/catalog.py
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
def is_vocabulary(self, table: TableInput) -> bool:
    """Check if a given table is a controlled vocabulary table.

    Delegates to ``Table.is_vocabulary()`` in deriva-py, which enforces both
    the required column names AND their types (ermrest_curie, ermrest_uri,
    text, markdown). The type check is stricter than a column-name-only
    check — a table with an ``ID`` column of the wrong type correctly
    returns False here where the legacy name-only implementation would
    have returned True.

    Mirrors :meth:`is_asset`, which already delegates to ``Table.is_asset()``.

    Args:
        table: An ERMrest Table object or the name of the table.

    Returns:
        True if the table has the structure of a controlled vocabulary,
        False otherwise.

    Raises:
        DerivaMLTableNotFound: If the table doesn't exist in any searchable
            schema (raised by :meth:`name_to_table`).

    Example:
        >>> model.is_vocabulary("Image_Class")  # doctest: +SKIP
        True
        >>> model.is_vocabulary("Image")  # doctest: +SKIP
        False
    """
    table = self.name_to_table(table)
    return table.is_vocabulary()

list_dataset_element_types

list_dataset_element_types() -> (
    list[Table]
)

List the deriva-py Table types that can be dataset members.

Walks Dataset.find_associations() and returns the other_fkey.pk_table for each association whose target is a domain-schema table or the Dataset table itself. Used by DerivaML.add_dataset_members to validate the kind of row a caller is trying to add to a dataset.

Returns:

Type Description
list[Table]

A list of :class:~deriva.core.ermrest_model.Table

list[Table]

objects — one per valid member type.

Example

[t.name for t in model.list_dataset_element_types()] # doctest: +SKIP ['Image', 'Subject', 'Dataset']

Source code in src/deriva_ml/model/catalog.py
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
def list_dataset_element_types(self) -> list[Table]:
    """List the deriva-py ``Table`` types that can be dataset members.

    Walks ``Dataset.find_associations()`` and returns the
    ``other_fkey.pk_table`` for each association whose target is a
    domain-schema table or the Dataset table itself. Used by
    ``DerivaML.add_dataset_members`` to validate the kind of row
    a caller is trying to add to a dataset.

    Returns:
        A list of :class:`~deriva.core.ermrest_model.Table`
        objects — one per valid member type.

    Example:
        >>> [t.name for t in model.list_dataset_element_types()]  # doctest: +SKIP
        ['Image', 'Subject', 'Dataset']
    """

    dataset_table = self.name_to_table("Dataset")

    def is_domain_or_dataset_table(table: Table) -> bool:
        return self.is_domain_schema(table.schema.name) or table.name == dataset_table.name

    return [
        t
        for a in dataset_table.find_associations()
        if is_domain_or_dataset_table(t := a.other_fkeys.pop().pk_table)
    ]

lookup_feature

lookup_feature(
    table: TableInput, feature_name: str
) -> Feature

Look up the named feature on table.

Features are association tables (linking a target table to vocabulary terms, assets, and metadata) discovered by :meth:find_features. This is the by-name accessor.

Parameters:

Name Type Description Default
table TableInput

The target table the feature is attached to. Name or :class:Table.

required
feature_name str

The feature's name as set in its Feature_Name column.

required

Returns:

Name Type Description
The Feature

class:Feature wrapper for the matching association.

Raises:

Type Description
DerivaMLTableNotFound

If table doesn't exist.

DerivaMLFeatureNotFound

If no feature with feature_name is defined on table.

Example

feature = model.lookup_feature("Image", "Quality") # doctest: +SKIP feature.feature_name # doctest: +SKIP 'Quality'

Source code in src/deriva_ml/model/catalog.py
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
def lookup_feature(self, table: TableInput, feature_name: str) -> Feature:
    """Look up the named feature on ``table``.

    Features are association tables (linking a target table to
    vocabulary terms, assets, and metadata) discovered by
    :meth:`find_features`. This is the by-name accessor.

    Args:
        table: The target table the feature is attached to. Name or
            :class:`Table`.
        feature_name: The feature's name as set in its
            ``Feature_Name`` column.

    Returns:
        The :class:`Feature` wrapper for the matching association.

    Raises:
        DerivaMLTableNotFound: If ``table`` doesn't exist.
        DerivaMLFeatureNotFound: If no feature with
            ``feature_name`` is defined on ``table``.

    Example:
        >>> feature = model.lookup_feature("Image", "Quality")  # doctest: +SKIP
        >>> feature.feature_name  # doctest: +SKIP
        'Quality'
    """
    table = self.name_to_table(table)
    try:
        return [f for f in self.find_features(table) if f.feature_name == feature_name][0]
    except IndexError:
        raise DerivaMLFeatureNotFound(table.name, feature_name) from None

name_to_table

name_to_table(
    table: TableInput,
) -> Table

Return the table object corresponding to the given table name.

Searches domain schemas first (in sorted order), then ML schema, then WWW. If the table name appears in more than one schema, returns the first match.

Parameters:

Name Type Description Default
table TableInput

A ERMRest table object or a string that is the name of the table.

required

Returns:

Type Description
Table

Table object.

Raises:

Type Description
DerivaMLTableNotFound

If the table doesn't exist in any searchable schema.

Example

image = model.name_to_table("Image") # doctest: +SKIP image.name # doctest: +SKIP 'Image'

Source code in src/deriva_ml/model/catalog.py
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
def name_to_table(self, table: TableInput) -> Table:
    """Return the table object corresponding to the given table name.

    Searches domain schemas first (in sorted order), then ML schema, then WWW.
    If the table name appears in more than one schema, returns the first match.

    Args:
      table: A ERMRest table object or a string that is the name of the table.

    Returns:
      Table object.

    Raises:
      DerivaMLTableNotFound: If the table doesn't exist in any searchable schema.

    Example:
        >>> image = model.name_to_table("Image")  # doctest: +SKIP
        >>> image.name  # doctest: +SKIP
        'Image'
    """
    if isinstance(table, Table):
        return table

    # Search domain schemas (sorted for deterministic order), then ML schema, then WWW
    search_order = [*sorted(self.domain_schemas), self.ml_schema, "WWW"]
    for sname in search_order:
        if sname not in self.model.schemas:
            continue
        s = self.model.schemas[sname]
        if table in s.tables:
            return s.tables[table]
    raise DerivaMLTableNotFound(str(table), msg="Table doesn't exist in any searchable schema")

refresh_model

refresh_model() -> None

Re-fetch the catalog model and replace self.model in place.

Calls catalog.getCatalogModel() and rebinds the result to self.model. Use this after a schema change (new table, column, or annotation) so subsequent introspection sees the current model.

Caching note: the asset-execution-table cache (_asset_execution_tables_cache) is keyed on the identity of self.model, so swapping the model out automatically invalidates it — the next call recomputes. The denormalize-planner cache (_planner_cache), if already built, keeps a reference to the previous model; if you depend on the planner reflecting a just-applied schema change, rebuild the instance rather than relying on refresh_model alone.

Returns:

Type Description
None

None. Mutates self.model as a side effect.

Example

ml.create_vocabulary("Severity", "Lesion grade") # doctest: +SKIP ml.refresh_model() # pick up the new table # doctest: +SKIP

Source code in src/deriva_ml/model/catalog.py
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
def refresh_model(self) -> None:
    """Re-fetch the catalog model and replace ``self.model`` in place.

    Calls ``catalog.getCatalogModel()`` and rebinds the result to
    ``self.model``. Use this after a schema change (new table, column,
    or annotation) so subsequent introspection sees the current model.

    Caching note: the asset-execution-table cache
    (``_asset_execution_tables_cache``) is keyed on the *identity* of
    ``self.model``, so swapping the model out automatically invalidates it
    — the next call recomputes. The denormalize-planner cache
    (``_planner_cache``), if already built, keeps a reference to the
    previous model; if you depend on the planner reflecting a just-applied
    schema change, rebuild the instance rather than relying on
    ``refresh_model`` alone.

    Returns:
        None. Mutates ``self.model`` as a side effect.

    Example:
        >>> ml.create_vocabulary("Severity", "Lesion grade")  # doctest: +SKIP
        >>> ml.refresh_model()  # pick up the new table  # doctest: +SKIP
    """
    self.model = self.catalog.getCatalogModel()

vocab_columns

vocab_columns(
    table: TableInput,
) -> dict[str, str]

Return mapping from canonical vocab column name to actual column name.

Canonical names are TitleCase (Name, ID, URI, Description, Synonyms). Actual names reflect the table's schema — could be lowercase for FaceBase-style catalogs or TitleCase for DerivaML-native tables.

Parameters:

Name Type Description Default
table TableInput

A table object or the name of the table.

required

Returns:

Type Description
dict[str, str]

Dict mapping canonical name to actual column name in the table.

dict[str, str]

E.g. {"Name": "name", "ID": "id", ...} for FaceBase tables

dict[str, str]

or {"Name": "Name", "ID": "ID", ...} for DerivaML tables.

Raises:

Type Description
DerivaMLTableNotFound

If the table doesn't exist (raised by :meth:name_to_table).

Example

model.vocab_columns("Image_Class") # doctest: +SKIP {'Name': 'Name', 'ID': 'ID', 'URI': 'URI', 'Description': 'Description', 'Synonyms': 'Synonyms'}

Source code in src/deriva_ml/model/catalog.py
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
def vocab_columns(self, table: TableInput) -> dict[str, str]:
    """Return mapping from canonical vocab column name to actual column name.

    Canonical names are TitleCase (Name, ID, URI, Description, Synonyms).
    Actual names reflect the table's schema — could be lowercase for
    FaceBase-style catalogs or TitleCase for DerivaML-native tables.

    Args:
        table: A table object or the name of the table.

    Returns:
        Dict mapping canonical name to actual column name in the table.
        E.g. ``{"Name": "name", "ID": "id", ...}`` for FaceBase tables
        or ``{"Name": "Name", "ID": "ID", ...}`` for DerivaML tables.

    Raises:
        DerivaMLTableNotFound: If the table doesn't exist (raised by
            :meth:`name_to_table`).

    Example:
        >>> model.vocab_columns("Image_Class")  # doctest: +SKIP
        {'Name': 'Name', 'ID': 'ID', 'URI': 'URI', 'Description': 'Description', 'Synonyms': 'Synonyms'}
    """
    table = self.name_to_table(table)
    col_map = {c.name.upper(): c.name for c in table.columns}
    return {canon: col_map[canon.upper()] for canon in ("Name", "ID", "URI", "Description", "Synonyms")}

Display dataclass

Bases: AnnotationBuilder

Display annotation for tables and columns.

Controls the display name, description/tooltip, and how null values and foreign key links are rendered. Can be applied to both tables and columns.

Parameters:

Name Type Description Default
name str | None

Display name shown in the UI (mutually exclusive with markdown_name)

None
markdown_name str | None

Markdown-formatted display name (mutually exclusive with name)

None
name_style NameStyle | None

Styling options for automatic name formatting

None
comment str | None

Description text shown as tooltip/help text

None
show_null dict[str, bool | str] | None

How to display null values, per context

None
show_foreign_key_link dict[str, bool] | None

Whether to show FK values as links, per context

None

Raises:

Type Description
ValueError

If both name and markdown_name are provided

Example

Build the annotation, then stage it on the table and push to the catalog (the apply path is the same for every builder — table.annotations[Builder.tag] = builder.to_dict() followed by ml.apply_annotations())::

>>> display = Display(name="Research Subjects")  # doctest: +SKIP
>>> table.annotations[Display.tag] = display.to_dict()  # doctest: +SKIP
>>> ml.apply_annotations()  # doctest: +SKIP

With description/tooltip::

>>> display = Display(  # doctest: +SKIP
...     name="Subjects",
...     comment="Individuals enrolled in research studies"
... )

Markdown-formatted name::

>>> display = Display(markdown_name="**Bold** _Italic_ Name")  # doctest: +SKIP

Context-specific null display::

>>> from deriva_ml.model import CONTEXT_COMPACT, CONTEXT_DETAILED  # doctest: +SKIP
>>> display = Display(  # doctest: +SKIP
...     name="Value",
...     show_null={
...         CONTEXT_COMPACT: False,      # Hide nulls in lists
...         CONTEXT_DETAILED: '"N/A"'    # Show "N/A" string
...     }
... )

Control foreign key link display::

>>> display = Display(  # doctest: +SKIP
...     name="Subject",
...     show_foreign_key_link={CONTEXT_COMPACT: False}
... )
Source code in src/deriva_ml/model/annotations.py
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
@dataclass
class Display(AnnotationBuilder):
    """Display annotation for tables and columns.

    Controls the display name, description/tooltip, and how null values
    and foreign key links are rendered. Can be applied to both tables
    and columns.

    Args:
        name: Display name shown in the UI (mutually exclusive with markdown_name)
        markdown_name: Markdown-formatted display name (mutually exclusive with name)
        name_style: Styling options for automatic name formatting
        comment: Description text shown as tooltip/help text
        show_null: How to display null values, per context
        show_foreign_key_link: Whether to show FK values as links, per context

    Raises:
        ValueError: If both name and markdown_name are provided

    Example:
        Build the annotation, then stage it on the table and push to
        the catalog (the apply path is the same for every builder —
        ``table.annotations[Builder.tag] = builder.to_dict()`` followed
        by ``ml.apply_annotations()``)::

            >>> display = Display(name="Research Subjects")  # doctest: +SKIP
            >>> table.annotations[Display.tag] = display.to_dict()  # doctest: +SKIP
            >>> ml.apply_annotations()  # doctest: +SKIP

        With description/tooltip::

            >>> display = Display(  # doctest: +SKIP
            ...     name="Subjects",
            ...     comment="Individuals enrolled in research studies"
            ... )

        Markdown-formatted name::

            >>> display = Display(markdown_name="**Bold** _Italic_ Name")  # doctest: +SKIP

        Context-specific null display::

            >>> from deriva_ml.model import CONTEXT_COMPACT, CONTEXT_DETAILED  # doctest: +SKIP
            >>> display = Display(  # doctest: +SKIP
            ...     name="Value",
            ...     show_null={
            ...         CONTEXT_COMPACT: False,      # Hide nulls in lists
            ...         CONTEXT_DETAILED: '"N/A"'    # Show "N/A" string
            ...     }
            ... )

        Control foreign key link display::

            >>> display = Display(  # doctest: +SKIP
            ...     name="Subject",
            ...     show_foreign_key_link={CONTEXT_COMPACT: False}
            ... )
    """

    tag = TAG_DISPLAY

    name: str | None = None
    markdown_name: str | None = None
    name_style: NameStyle | None = None
    comment: str | None = None
    show_null: dict[str, bool | str] | None = None
    show_foreign_key_link: dict[str, bool] | None = None

    def __post_init__(self):
        if self.name and self.markdown_name:
            raise ValueError("name and markdown_name are mutually exclusive")

    def to_dict(self) -> dict[str, Any]:
        result = {}
        if self.name is not None:
            result["name"] = self.name
        if self.markdown_name is not None:
            result["markdown_name"] = self.markdown_name
        if self.name_style is not None:
            style_dict = self.name_style.to_dict()
            if style_dict:
                result["name_style"] = style_dict
        if self.comment is not None:
            result["comment"] = self.comment
        if self.show_null is not None:
            result["show_null"] = self.show_null
        if self.show_foreign_key_link is not None:
            result["show_foreign_key_link"] = self.show_foreign_key_link
        return result

Facet dataclass

A facet definition for filtering.

Parameters:

Name Type Description Default
source str | list[str | InboundFK | OutboundFK] | None

Path to source data

None
sourcekey str | None

Reference to named source

None
markdown_name str | None

Display name

None
comment str | None

Description

None
entity bool | None

Whether this is an entity facet

None
open bool | None

Start expanded

None
ux_mode FacetUxMode | None

UI mode (choices, ranges, check_presence)

None
bar_plot bool | None

Show bar plot

None
choices list[Any] | None

Preset choice values

None
ranges list[FacetRange] | None

Preset range values

None
not_null bool | None

Filter to non-null values

None
hide_null_choice bool | None

Hide "null" option

None
hide_not_null_choice bool | None

Hide "not null" option

None
n_bins int | None

Number of bins for histogram

None
Source code in src/deriva_ml/model/annotations.py
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
@dataclass
class Facet:
    """A facet definition for filtering.

    Args:
        source: Path to source data
        sourcekey: Reference to named source
        markdown_name: Display name
        comment: Description
        entity: Whether this is an entity facet
        open: Start expanded
        ux_mode: UI mode (choices, ranges, check_presence)
        bar_plot: Show bar plot
        choices: Preset choice values
        ranges: Preset range values
        not_null: Filter to non-null values
        hide_null_choice: Hide "null" option
        hide_not_null_choice: Hide "not null" option
        n_bins: Number of bins for histogram
    """

    source: str | list[str | InboundFK | OutboundFK] | None = None
    sourcekey: str | None = None
    markdown_name: str | None = None
    comment: str | None = None
    entity: bool | None = None
    open: bool | None = None
    ux_mode: FacetUxMode | None = None
    bar_plot: bool | None = None
    choices: list[Any] | None = None
    ranges: list[FacetRange] | None = None
    not_null: bool | None = None
    hide_null_choice: bool | None = None
    hide_not_null_choice: bool | None = None
    n_bins: int | None = None

    def to_dict(self) -> dict[str, Any]:
        result = {}

        if self.source is not None:
            if isinstance(self.source, str):
                result["source"] = self.source
            else:
                result["source"] = [item.to_dict() if hasattr(item, "to_dict") else item for item in self.source]

        if self.sourcekey is not None:
            result["sourcekey"] = self.sourcekey
        if self.markdown_name is not None:
            result["markdown_name"] = self.markdown_name
        if self.comment is not None:
            result["comment"] = self.comment
        if self.entity is not None:
            result["entity"] = self.entity
        if self.open is not None:
            result["open"] = self.open
        if self.ux_mode is not None:
            result["ux_mode"] = self.ux_mode.value
        if self.bar_plot is not None:
            result["bar_plot"] = self.bar_plot
        if self.choices is not None:
            result["choices"] = self.choices
        if self.ranges is not None:
            result["ranges"] = [r.to_dict() for r in self.ranges]
        if self.not_null is not None:
            result["not_null"] = self.not_null
        if self.hide_null_choice is not None:
            result["hide_null_choice"] = self.hide_null_choice
        if self.hide_not_null_choice is not None:
            result["hide_not_null_choice"] = self.hide_not_null_choice
        if self.n_bins is not None:
            result["n_bins"] = self.n_bins

        return result

FacetList dataclass

A list of facets for filtering (visible_columns.filter).

Example

facets = FacetList([ # doctest: +SKIP ... Facet(source="Species", open=True), ... Facet(source="Age", ux_mode=FacetUxMode.RANGES) ... ])

Source code in src/deriva_ml/model/annotations.py
1260
1261
1262
1263
1264
1265
1266
1267
1268
1269
1270
1271
1272
1273
1274
1275
1276
1277
1278
1279
@dataclass
class FacetList:
    """A list of facets for filtering (visible_columns.filter).

    Example:
        >>> facets = FacetList([  # doctest: +SKIP
        ...     Facet(source="Species", open=True),
        ...     Facet(source="Age", ux_mode=FacetUxMode.RANGES)
        ... ])
    """

    facets: list[Facet] = field(default_factory=list)

    def add(self, facet: Facet) -> "FacetList":
        """Add a facet to the list."""
        self.facets.append(facet)
        return self

    def to_dict(self) -> dict[str, list[dict]]:
        return {"and": [f.to_dict() for f in self.facets]}

add

add(facet: Facet) -> 'FacetList'

Add a facet to the list.

Source code in src/deriva_ml/model/annotations.py
1273
1274
1275
1276
def add(self, facet: Facet) -> "FacetList":
    """Add a facet to the list."""
    self.facets.append(facet)
    return self

FacetRange dataclass

A range for facet filtering.

Parameters:

Name Type Description Default
min float | None

Minimum value

None
max float | None

Maximum value

None
min_exclusive bool | None

Exclude min value

None
max_exclusive bool | None

Exclude max value

None
Source code in src/deriva_ml/model/annotations.py
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
@dataclass
class FacetRange:
    """A range for facet filtering.

    Args:
        min: Minimum value
        max: Maximum value
        min_exclusive: Exclude min value
        max_exclusive: Exclude max value
    """

    min: float | None = None
    max: float | None = None
    min_exclusive: bool | None = None
    max_exclusive: bool | None = None

    def to_dict(self) -> dict[str, Any]:
        result = {}
        if self.min is not None:
            result["min"] = self.min
        if self.max is not None:
            result["max"] = self.max
        if self.min_exclusive is not None:
            result["min_exclusive"] = self.min_exclusive
        if self.max_exclusive is not None:
            result["max_exclusive"] = self.max_exclusive
        return result

FacetUxMode

Bases: str, Enum

UX modes for facet filters in the search panel.

Controls how users interact with a facet filter.

Attributes:

Name Type Description
CHOICES

Checkbox list for selecting values

RANGES

Range slider/inputs for numeric or date ranges

CHECK_PRESENCE

Check if value exists or is null

Example

Choice-based facet

Facet(source="Status", ux_mode=FacetUxMode.CHOICES) # doctest: +SKIP

Range-based facet for numeric values

Facet(source="Age", ux_mode=FacetUxMode.RANGES) # doctest: +SKIP

Check presence (has value / no value)

Facet(source="Notes", ux_mode=FacetUxMode.CHECK_PRESENCE) # doctest: +SKIP

Source code in src/deriva_ml/model/annotations.py
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
class FacetUxMode(str, Enum):
    """UX modes for facet filters in the search panel.

    Controls how users interact with a facet filter.

    Attributes:
        CHOICES: Checkbox list for selecting values
        RANGES: Range slider/inputs for numeric or date ranges
        CHECK_PRESENCE: Check if value exists or is null

    Example:
        >>> # Choice-based facet
        >>> Facet(source="Status", ux_mode=FacetUxMode.CHOICES)  # doctest: +SKIP
        >>>
        >>> # Range-based facet for numeric values
        >>> Facet(source="Age", ux_mode=FacetUxMode.RANGES)  # doctest: +SKIP
        >>>
        >>> # Check presence (has value / no value)
        >>> Facet(source="Notes", ux_mode=FacetUxMode.CHECK_PRESENCE)  # doctest: +SKIP
    """

    CHOICES = "choices"
    RANGES = "ranges"
    CHECK_PRESENCE = "check_presence"

InboundFK dataclass

An inbound foreign key path step for pseudo-column source paths.

Use this when following a foreign key FROM another table TO the current table. This is common when counting or aggregating related records.

Parameters:

Name Type Description Default
schema str

Schema name containing the FK constraint

required
constraint str

Foreign key constraint name

required
Example

Count images related to a subject (Image has FK to Subject)::

>>> # In Subject table, count related images
>>> pc = PseudoColumn(  # doctest: +SKIP
...     source=[InboundFK("domain", "Image_Subject_fkey"), "RID"],
...     aggregate=Aggregate.CNT,
...     markdown_name="Image Count"
... )
Source code in src/deriva_ml/model/annotations.py
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
@dataclass
class InboundFK:
    """An inbound foreign key path step for pseudo-column source paths.

    Use this when following a foreign key FROM another table TO the current table.
    This is common when counting or aggregating related records.

    Args:
        schema: Schema name containing the FK constraint
        constraint: Foreign key constraint name

    Example:
        Count images related to a subject (Image has FK to Subject)::

            >>> # In Subject table, count related images
            >>> pc = PseudoColumn(  # doctest: +SKIP
            ...     source=[InboundFK("domain", "Image_Subject_fkey"), "RID"],
            ...     aggregate=Aggregate.CNT,
            ...     markdown_name="Image Count"
            ... )
    """

    schema: str
    constraint: str

    def to_dict(self) -> dict[str, list[str]]:
        return {"inbound": [self.schema, self.constraint]}

NameStyle dataclass

Styling options for automatic display name formatting.

Applied to table or column names when no explicit display name is set.

Parameters:

Name Type Description Default
underline_space bool | None

Replace underscores with spaces (e.g., "First_Name" -> "First Name")

None
title_case bool | None

Apply title case formatting (e.g., "firstname" -> "Firstname")

None
markdown bool | None

Render the name as markdown

None
Example

Transform "Subject_ID" to "Subject Id" with title case

display = Display( # doctest: +SKIP ... name_style=NameStyle(underline_space=True, title_case=True) ... )

Source code in src/deriva_ml/model/annotations.py
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
@dataclass
class NameStyle:
    """Styling options for automatic display name formatting.

    Applied to table or column names when no explicit display name is set.

    Args:
        underline_space: Replace underscores with spaces (e.g., "First_Name" -> "First Name")
        title_case: Apply title case formatting (e.g., "firstname" -> "Firstname")
        markdown: Render the name as markdown

    Example:
        >>> # Transform "Subject_ID" to "Subject Id" with title case
        >>> display = Display(  # doctest: +SKIP
        ...     name_style=NameStyle(underline_space=True, title_case=True)
        ... )
    """

    underline_space: bool | None = None
    title_case: bool | None = None
    markdown: bool | None = None

    def to_dict(self) -> dict[str, bool]:
        """Convert to dictionary, excluding None values."""
        result = {}
        if self.underline_space is not None:
            result["underline_space"] = self.underline_space
        if self.title_case is not None:
            result["title_case"] = self.title_case
        if self.markdown is not None:
            result["markdown"] = self.markdown
        return result

to_dict

to_dict() -> dict[str, bool]

Convert to dictionary, excluding None values.

Source code in src/deriva_ml/model/annotations.py
332
333
334
335
336
337
338
339
340
341
def to_dict(self) -> dict[str, bool]:
    """Convert to dictionary, excluding None values."""
    result = {}
    if self.underline_space is not None:
        result["underline_space"] = self.underline_space
    if self.title_case is not None:
        result["title_case"] = self.title_case
    if self.markdown is not None:
        result["markdown"] = self.markdown
    return result

OutboundFK dataclass

An outbound foreign key path step for pseudo-column source paths.

Use this when following a foreign key FROM the current table TO another table. This is common when displaying values from referenced tables.

Parameters:

Name Type Description Default
schema str

Schema name containing the FK constraint

required
constraint str

Foreign key constraint name

required
Example

Show species name from a related Species table::

>>> # Subject has FK to Species, display Species.Name
>>> pc = PseudoColumn(  # doctest: +SKIP
...     source=[OutboundFK("domain", "Subject_Species_fkey"), "Name"],
...     markdown_name="Species"
... )

Chain multiple outbound FKs::

>>> # Image -> Subject -> Species
>>> pc = PseudoColumn(  # doctest: +SKIP
...     source=[
...         OutboundFK("domain", "Image_Subject_fkey"),
...         OutboundFK("domain", "Subject_Species_fkey"),
...         "Name"
...     ],
...     markdown_name="Species"
... )
Source code in src/deriva_ml/model/annotations.py
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
@dataclass
class OutboundFK:
    """An outbound foreign key path step for pseudo-column source paths.

    Use this when following a foreign key FROM the current table TO another table.
    This is common when displaying values from referenced tables.

    Args:
        schema: Schema name containing the FK constraint
        constraint: Foreign key constraint name

    Example:
        Show species name from a related Species table::

            >>> # Subject has FK to Species, display Species.Name
            >>> pc = PseudoColumn(  # doctest: +SKIP
            ...     source=[OutboundFK("domain", "Subject_Species_fkey"), "Name"],
            ...     markdown_name="Species"
            ... )

        Chain multiple outbound FKs::

            >>> # Image -> Subject -> Species
            >>> pc = PseudoColumn(  # doctest: +SKIP
            ...     source=[
            ...         OutboundFK("domain", "Image_Subject_fkey"),
            ...         OutboundFK("domain", "Subject_Species_fkey"),
            ...         "Name"
            ...     ],
            ...     markdown_name="Species"
            ... )
    """

    schema: str
    constraint: str

    def to_dict(self) -> dict[str, list[str]]:
        return {"outbound": [self.schema, self.constraint]}

PreFormat dataclass

Pre-formatting options for column values.

Parameters:

Name Type Description Default
format str | None

Printf-style format string (e.g., "%.2f")

None
bool_true_value str | None

Display value for True

None
bool_false_value str | None

Display value for False

None
Source code in src/deriva_ml/model/annotations.py
1044
1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
@dataclass
class PreFormat:
    """Pre-formatting options for column values.

    Args:
        format: Printf-style format string (e.g., "%.2f")
        bool_true_value: Display value for True
        bool_false_value: Display value for False
    """

    format: str | None = None
    bool_true_value: str | None = None
    bool_false_value: str | None = None

    def to_dict(self) -> dict[str, Any]:
        result = {}
        if self.format is not None:
            result["format"] = self.format
        if self.bool_true_value is not None:
            result["bool_true_value"] = self.bool_true_value
        if self.bool_false_value is not None:
            result["bool_false_value"] = self.bool_false_value
        return result

PseudoColumn dataclass

A pseudo-column definition for visible columns and foreign keys.

Pseudo-columns display computed values, values from related tables, or custom markdown patterns. They appear as columns in table views but are not actual database columns.

Parameters:

Name Type Description Default
source str | list[str | InboundFK | OutboundFK] | None

Path to source data. Can be: - A column name (string) - A list of FK path steps ending with a column name

None
sourcekey str | None

Reference to a named source in source-definitions annotation

None
markdown_name str | None

Display name for the column (supports markdown)

None
comment str | Literal[False] | None

Description/tooltip text (or False to hide)

None
entity bool | None

Whether this represents an entity (affects rendering)

None
aggregate Aggregate | None

Aggregation function when source returns multiple values

None
self_link bool | None

Make the value a link to the current row

None
display PseudoColumnDisplay | None

Display formatting options

None
array_options dict[str, Any] | None

Options for array aggregates (max_length, order)

None
Note

source and sourcekey are mutually exclusive. Use source for inline definitions, sourcekey to reference pre-defined sources.

Raises:

Type Description
ValueError

If both source and sourcekey are provided

Example

Simple column with custom display name::

>>> PseudoColumn(source="Internal_ID", markdown_name="ID")  # doctest: +SKIP

Outbound FK traversal (display value from referenced table)::

>>> # Subject has FK to Species - show Species.Name
>>> PseudoColumn(  # doctest: +SKIP
...     source=[OutboundFK("domain", "Subject_Species_fkey"), "Name"],
...     markdown_name="Species"
... )

Inbound FK with aggregation (count related records)::

>>> # Count images pointing to this subject
>>> PseudoColumn(  # doctest: +SKIP
...     source=[InboundFK("domain", "Image_Subject_fkey"), "RID"],
...     aggregate=Aggregate.CNT,
...     markdown_name="Images"
... )

Multi-hop FK path::

>>> # Image -> Subject -> Species
>>> PseudoColumn(  # doctest: +SKIP
...     source=[
...         OutboundFK("domain", "Image_Subject_fkey"),
...         OutboundFK("domain", "Subject_Species_fkey"),
...         "Name"
...     ],
...     markdown_name="Species"
... )

With custom display formatting::

>>> PseudoColumn(  # doctest: +SKIP
...     source="URL",
...     display=PseudoColumnDisplay(
...         markdown_pattern="[Download]({{{_value}}})",
...         show_foreign_key_link=False
...     )
... )

Array aggregate with display options::

>>> PseudoColumn(  # doctest: +SKIP
...     source=[InboundFK("domain", "Tag_Item_fkey"), "Name"],
...     aggregate=Aggregate.ARRAY_D,
...     display=PseudoColumnDisplay(array_ux_mode=ArrayUxMode.CSV),
...     markdown_name="Tags"
... )
Source code in src/deriva_ml/model/annotations.py
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
@dataclass
class PseudoColumn:
    """A pseudo-column definition for visible columns and foreign keys.

    Pseudo-columns display computed values, values from related tables,
    or custom markdown patterns. They appear as columns in table views
    but are not actual database columns.

    Args:
        source: Path to source data. Can be:
            - A column name (string)
            - A list of FK path steps ending with a column name
        sourcekey: Reference to a named source in source-definitions annotation
        markdown_name: Display name for the column (supports markdown)
        comment: Description/tooltip text (or False to hide)
        entity: Whether this represents an entity (affects rendering)
        aggregate: Aggregation function when source returns multiple values
        self_link: Make the value a link to the current row
        display: Display formatting options
        array_options: Options for array aggregates (max_length, order)

    Note:
        source and sourcekey are mutually exclusive. Use source for inline
        definitions, sourcekey to reference pre-defined sources.

    Raises:
        ValueError: If both source and sourcekey are provided

    Example:
        Simple column with custom display name::

            >>> PseudoColumn(source="Internal_ID", markdown_name="ID")  # doctest: +SKIP

        Outbound FK traversal (display value from referenced table)::

            >>> # Subject has FK to Species - show Species.Name
            >>> PseudoColumn(  # doctest: +SKIP
            ...     source=[OutboundFK("domain", "Subject_Species_fkey"), "Name"],
            ...     markdown_name="Species"
            ... )

        Inbound FK with aggregation (count related records)::

            >>> # Count images pointing to this subject
            >>> PseudoColumn(  # doctest: +SKIP
            ...     source=[InboundFK("domain", "Image_Subject_fkey"), "RID"],
            ...     aggregate=Aggregate.CNT,
            ...     markdown_name="Images"
            ... )

        Multi-hop FK path::

            >>> # Image -> Subject -> Species
            >>> PseudoColumn(  # doctest: +SKIP
            ...     source=[
            ...         OutboundFK("domain", "Image_Subject_fkey"),
            ...         OutboundFK("domain", "Subject_Species_fkey"),
            ...         "Name"
            ...     ],
            ...     markdown_name="Species"
            ... )

        With custom display formatting::

            >>> PseudoColumn(  # doctest: +SKIP
            ...     source="URL",
            ...     display=PseudoColumnDisplay(
            ...         markdown_pattern="[Download]({{{_value}}})",
            ...         show_foreign_key_link=False
            ...     )
            ... )

        Array aggregate with display options::

            >>> PseudoColumn(  # doctest: +SKIP
            ...     source=[InboundFK("domain", "Tag_Item_fkey"), "Name"],
            ...     aggregate=Aggregate.ARRAY_D,
            ...     display=PseudoColumnDisplay(array_ux_mode=ArrayUxMode.CSV),
            ...     markdown_name="Tags"
            ... )
    """

    source: str | list[str | InboundFK | OutboundFK] | None = None
    sourcekey: str | None = None
    markdown_name: str | None = None
    comment: str | Literal[False] | None = None
    entity: bool | None = None
    aggregate: Aggregate | None = None
    self_link: bool | None = None
    display: PseudoColumnDisplay | None = None
    array_options: dict[str, Any] | None = None  # Can be complex

    def __post_init__(self):
        if self.source is not None and self.sourcekey is not None:
            raise ValueError("source and sourcekey are mutually exclusive")

    def to_dict(self) -> dict[str, Any]:
        result = {}

        if self.source is not None:
            if isinstance(self.source, str):
                result["source"] = self.source
            else:
                # Convert path elements
                result["source"] = [item.to_dict() if hasattr(item, "to_dict") else item for item in self.source]

        if self.sourcekey is not None:
            result["sourcekey"] = self.sourcekey
        if self.markdown_name is not None:
            result["markdown_name"] = self.markdown_name
        if self.comment is not None:
            result["comment"] = self.comment
        if self.entity is not None:
            result["entity"] = self.entity
        if self.aggregate is not None:
            result["aggregate"] = self.aggregate.value
        if self.self_link is not None:
            result["self_link"] = self.self_link
        if self.display is not None:
            result["display"] = self.display.to_dict()
        if self.array_options is not None:
            result["array_options"] = self.array_options

        return result

PseudoColumnDisplay dataclass

Display options for a pseudo-column.

Parameters:

Name Type Description Default
markdown_pattern str | None

Handlebars/mustache template

None
template_engine TemplateEngine | None

Template engine to use

None
show_foreign_key_link bool | None

Show as clickable link

None
array_ux_mode ArrayUxMode | None

How to render array values

None
column_order list[SortKey] | Literal[False] | None

Sort order for the column, or False to disable

None
wait_for list[str] | None

Template variables to wait for before rendering

None
Source code in src/deriva_ml/model/annotations.py
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
@dataclass
class PseudoColumnDisplay:
    """Display options for a pseudo-column.

    Args:
        markdown_pattern: Handlebars/mustache template
        template_engine: Template engine to use
        show_foreign_key_link: Show as clickable link
        array_ux_mode: How to render array values
        column_order: Sort order for the column, or False to disable
        wait_for: Template variables to wait for before rendering
    """

    markdown_pattern: str | None = None
    template_engine: TemplateEngine | None = None
    show_foreign_key_link: bool | None = None
    array_ux_mode: ArrayUxMode | None = None
    column_order: list[SortKey] | Literal[False] | None = None
    wait_for: list[str] | None = None

    def to_dict(self) -> dict[str, Any]:
        result = {}
        if self.markdown_pattern is not None:
            result["markdown_pattern"] = self.markdown_pattern
        if self.template_engine is not None:
            result["template_engine"] = self.template_engine.value
        if self.show_foreign_key_link is not None:
            result["show_foreign_key_link"] = self.show_foreign_key_link
        if self.array_ux_mode is not None:
            result["array_ux_mode"] = self.array_ux_mode.value
        if self.column_order is not None:
            if self.column_order is False:
                result["column_order"] = False
            else:
                result["column_order"] = [k.to_dict() if isinstance(k, SortKey) else k for k in self.column_order]
        if self.wait_for is not None:
            result["wait_for"] = self.wait_for
        return result

SortKey dataclass

A sort key for row ordering.

Parameters:

Name Type Description Default
column str

Column name to sort by

required
descending bool

Sort in descending order (default False)

False
Example

SortKey("Name") # Ascending # doctest: +SKIP SortKey("Created", descending=True) # Descending # doctest: +SKIP

Source code in src/deriva_ml/model/annotations.py
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
@dataclass
class SortKey:
    """A sort key for row ordering.

    Args:
        column: Column name to sort by
        descending: Sort in descending order (default False)

    Example:
        >>> SortKey("Name")  # Ascending  # doctest: +SKIP
        >>> SortKey("Created", descending=True)  # Descending  # doctest: +SKIP
    """

    column: str
    descending: bool = False

    def to_dict(self) -> dict[str, Any] | str:
        """Convert to dict or string (if ascending)."""
        if self.descending:
            return {"column": self.column, "descending": True}
        return self.column

to_dict

to_dict() -> dict[str, Any] | str

Convert to dict or string (if ascending).

Source code in src/deriva_ml/model/annotations.py
456
457
458
459
460
def to_dict(self) -> dict[str, Any] | str:
    """Convert to dict or string (if ascending)."""
    if self.descending:
        return {"column": self.column, "descending": True}
    return self.column

TableDisplay dataclass

Bases: AnnotationBuilder

Table-display annotation builder.

Controls table-level display options like row naming and ordering.

Example

td = TableDisplay() # doctest: +SKIP td.row_name(row_markdown_pattern="{{{Name}}} ({{{Species}}})") # doctest: +SKIP td.compact(row_order=[SortKey("Name")]) # doctest: +SKIP

Source code in src/deriva_ml/model/annotations.py
 987
 988
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
1001
1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
1030
1031
1032
1033
1034
1035
1036
@dataclass
class TableDisplay(AnnotationBuilder):
    """Table-display annotation builder.

    Controls table-level display options like row naming and ordering.

    Example:
        >>> td = TableDisplay()  # doctest: +SKIP
        >>> td.row_name(row_markdown_pattern="{{{Name}}} ({{{Species}}})")  # doctest: +SKIP
        >>> td.compact(row_order=[SortKey("Name")])  # doctest: +SKIP
    """

    tag = TAG_TABLE_DISPLAY

    _contexts: dict[str, TableDisplayOptions | str | None] = field(default_factory=dict)

    def set_context(self, context: str, options: TableDisplayOptions | str | None) -> "TableDisplay":
        """Set options for a context."""
        self._contexts[context] = options
        return self

    def row_name(self, row_markdown_pattern: str, template_engine: TemplateEngine | None = None) -> "TableDisplay":
        """Set row name pattern (used in foreign key dropdowns, etc.)."""
        return self.set_context(
            CONTEXT_ROW_NAME,
            TableDisplayOptions(row_markdown_pattern=row_markdown_pattern, template_engine=template_engine),
        )

    def compact(self, options: TableDisplayOptions) -> "TableDisplay":
        """Set options for compact (list) view."""
        return self.set_context(CONTEXT_COMPACT, options)

    def detailed(self, options: TableDisplayOptions) -> "TableDisplay":
        """Set options for detailed (record) view."""
        return self.set_context(CONTEXT_DETAILED, options)

    def default(self, options: TableDisplayOptions) -> "TableDisplay":
        """Set default options."""
        return self.set_context(CONTEXT_DEFAULT, options)

    def to_dict(self) -> dict[str, Any]:
        result = {}
        for context, options in self._contexts.items():
            if options is None:
                result[context] = None
            elif isinstance(options, str):
                result[context] = options
            else:
                result[context] = options.to_dict()
        return result

compact

compact(
    options: TableDisplayOptions,
) -> "TableDisplay"

Set options for compact (list) view.

Source code in src/deriva_ml/model/annotations.py
1015
1016
1017
def compact(self, options: TableDisplayOptions) -> "TableDisplay":
    """Set options for compact (list) view."""
    return self.set_context(CONTEXT_COMPACT, options)

default

default(
    options: TableDisplayOptions,
) -> "TableDisplay"

Set default options.

Source code in src/deriva_ml/model/annotations.py
1023
1024
1025
def default(self, options: TableDisplayOptions) -> "TableDisplay":
    """Set default options."""
    return self.set_context(CONTEXT_DEFAULT, options)

detailed

detailed(
    options: TableDisplayOptions,
) -> "TableDisplay"

Set options for detailed (record) view.

Source code in src/deriva_ml/model/annotations.py
1019
1020
1021
def detailed(self, options: TableDisplayOptions) -> "TableDisplay":
    """Set options for detailed (record) view."""
    return self.set_context(CONTEXT_DETAILED, options)

row_name

row_name(
    row_markdown_pattern: str,
    template_engine: TemplateEngine
    | None = None,
) -> "TableDisplay"

Set row name pattern (used in foreign key dropdowns, etc.).

Source code in src/deriva_ml/model/annotations.py
1008
1009
1010
1011
1012
1013
def row_name(self, row_markdown_pattern: str, template_engine: TemplateEngine | None = None) -> "TableDisplay":
    """Set row name pattern (used in foreign key dropdowns, etc.)."""
    return self.set_context(
        CONTEXT_ROW_NAME,
        TableDisplayOptions(row_markdown_pattern=row_markdown_pattern, template_engine=template_engine),
    )

set_context

set_context(
    context: str,
    options: TableDisplayOptions
    | str
    | None,
) -> "TableDisplay"

Set options for a context.

Source code in src/deriva_ml/model/annotations.py
1003
1004
1005
1006
def set_context(self, context: str, options: TableDisplayOptions | str | None) -> "TableDisplay":
    """Set options for a context."""
    self._contexts[context] = options
    return self

TableDisplayOptions dataclass

Options for a single table display context.

Parameters:

Name Type Description Default
row_order list[SortKey] | None

Sort order for rows

None
page_size int | None

Number of rows per page

None
row_markdown_pattern str | None

Template for row names

None
page_markdown_pattern str | None

Template for page header

None
separator_markdown str | None

Template between rows

None
prefix_markdown str | None

Template before rows

None
suffix_markdown str | None

Template after rows

None
template_engine TemplateEngine | None

Template engine for patterns

None
collapse_toc_panel bool | None

Collapse TOC panel

None
hide_column_headers bool | None

Hide column headers

None
Source code in src/deriva_ml/model/annotations.py
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
@dataclass
class TableDisplayOptions:
    """Options for a single table display context.

    Args:
        row_order: Sort order for rows
        page_size: Number of rows per page
        row_markdown_pattern: Template for row names
        page_markdown_pattern: Template for page header
        separator_markdown: Template between rows
        prefix_markdown: Template before rows
        suffix_markdown: Template after rows
        template_engine: Template engine for patterns
        collapse_toc_panel: Collapse TOC panel
        hide_column_headers: Hide column headers
    """

    row_order: list[SortKey] | None = None
    page_size: int | None = None
    row_markdown_pattern: str | None = None
    page_markdown_pattern: str | None = None
    separator_markdown: str | None = None
    prefix_markdown: str | None = None
    suffix_markdown: str | None = None
    template_engine: TemplateEngine | None = None
    collapse_toc_panel: bool | None = None
    hide_column_headers: bool | None = None

    def to_dict(self) -> dict[str, Any]:
        result = {}
        if self.row_order is not None:
            result["row_order"] = [k.to_dict() if isinstance(k, SortKey) else k for k in self.row_order]
        if self.page_size is not None:
            result["page_size"] = self.page_size
        if self.row_markdown_pattern is not None:
            result["row_markdown_pattern"] = self.row_markdown_pattern
        if self.page_markdown_pattern is not None:
            result["page_markdown_pattern"] = self.page_markdown_pattern
        if self.separator_markdown is not None:
            result["separator_markdown"] = self.separator_markdown
        if self.prefix_markdown is not None:
            result["prefix_markdown"] = self.prefix_markdown
        if self.suffix_markdown is not None:
            result["suffix_markdown"] = self.suffix_markdown
        if self.template_engine is not None:
            result["template_engine"] = self.template_engine.value
        if self.collapse_toc_panel is not None:
            result["collapse_toc_panel"] = self.collapse_toc_panel
        if self.hide_column_headers is not None:
            result["hide_column_headers"] = self.hide_column_headers
        return result

TemplateEngine

Bases: str, Enum

Template engine for markdown patterns.

Attributes:

Name Type Description
HANDLEBARS

Use Handlebars.js templating (recommended, more features)

MUSTACHE

Use Mustache templating (simpler, fewer features)

Example

display = PseudoColumnDisplay( # doctest: +SKIP ... markdown_pattern="{{{Name}}}", ... template_engine=TemplateEngine.HANDLEBARS ... )

Source code in src/deriva_ml/model/annotations.py
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
class TemplateEngine(str, Enum):
    """Template engine for markdown patterns.

    Attributes:
        HANDLEBARS: Use Handlebars.js templating (recommended, more features)
        MUSTACHE: Use Mustache templating (simpler, fewer features)

    Example:
        >>> display = PseudoColumnDisplay(  # doctest: +SKIP
        ...     markdown_pattern="[{{{Name}}}]({{{URL}}})",
        ...     template_engine=TemplateEngine.HANDLEBARS
        ... )
    """

    HANDLEBARS = "handlebars"
    MUSTACHE = "mustache"

VisibleColumns dataclass

Bases: AnnotationBuilder

Visible-columns annotation builder.

Controls which columns appear in different UI contexts and their order. This is one of the most commonly used annotations for customizing the Chaise interface.

Column entries can be: - Column names (strings): "Name", "RID", "Description" - Foreign key references: fk_constraint("schema", "constraint_name") - Pseudo-columns: PseudoColumn(...) for computed/derived values

Contexts: - compact: Table/list views (search results, data browser) - detailed: Single record view (full record page) - entry: Create/edit forms - entry/create: Create form only - entry/edit: Edit form only - *: Default for all contexts

Example

Basic column lists for different contexts, then stage and apply (same table.annotations[VisibleColumns.tag] = vc.to_dict(); ml.apply_annotations() path as every builder)::

>>> vc = VisibleColumns()  # doctest: +SKIP
>>> vc.compact(["RID", "Name", "Status"])  # doctest: +SKIP
>>> vc.detailed(["RID", "Name", "Status", "Description", "Created"])  # doctest: +SKIP
>>> vc.entry(["Name", "Status", "Description"])  # doctest: +SKIP
>>> table.annotations[VisibleColumns.tag] = vc.to_dict()  # doctest: +SKIP
>>> ml.apply_annotations()  # doctest: +SKIP

Method chaining::

>>> vc = (VisibleColumns()  # doctest: +SKIP
...     .compact(["RID", "Name"])
...     .detailed(["RID", "Name", "Description"])
...     .entry(["Name", "Description"]))

Including foreign key references::

>>> vc = VisibleColumns()  # doctest: +SKIP
>>> vc.compact([  # doctest: +SKIP
...     "RID",
...     "Name",
...     fk_constraint("domain", "Subject_Species_fkey"),
... ])

With pseudo-columns for computed values::

>>> vc = VisibleColumns()  # doctest: +SKIP
>>> vc.compact([  # doctest: +SKIP
...     "RID",
...     "Name",
...     PseudoColumn(
...         source=[InboundFK("domain", "Sample_Subject_fkey"), "RID"],
...         aggregate=Aggregate.CNT,
...         markdown_name="Samples"
...     ),
... ])

Context inheritance (reference another context)::

>>> vc = VisibleColumns()  # doctest: +SKIP
>>> vc.compact(["RID", "Name"])  # doctest: +SKIP
>>> vc.set_context("compact/brief", "compact")  # Inherit from compact  # doctest: +SKIP

With faceted search (filter context)::

>>> vc = VisibleColumns()  # doctest: +SKIP
>>> vc.compact(["RID", "Name", "Status"])  # doctest: +SKIP
>>> facets = FacetList()  # doctest: +SKIP
>>> facets.add(Facet(source="Status", open=True))  # doctest: +SKIP
>>> vc._contexts["filter"] = facets.to_dict()  # doctest: +SKIP
Source code in src/deriva_ml/model/annotations.py
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
@dataclass
class VisibleColumns(AnnotationBuilder):
    """Visible-columns annotation builder.

    Controls which columns appear in different UI contexts and their order.
    This is one of the most commonly used annotations for customizing the
    Chaise interface.

    Column entries can be:
    - Column names (strings): "Name", "RID", "Description"
    - Foreign key references: fk_constraint("schema", "constraint_name")
    - Pseudo-columns: PseudoColumn(...) for computed/derived values

    Contexts:
    - ``compact``: Table/list views (search results, data browser)
    - ``detailed``: Single record view (full record page)
    - ``entry``: Create/edit forms
    - ``entry/create``: Create form only
    - ``entry/edit``: Edit form only
    - ``*``: Default for all contexts

    Example:
        Basic column lists for different contexts, then stage and apply
        (same ``table.annotations[VisibleColumns.tag] = vc.to_dict();
        ml.apply_annotations()`` path as every builder)::

            >>> vc = VisibleColumns()  # doctest: +SKIP
            >>> vc.compact(["RID", "Name", "Status"])  # doctest: +SKIP
            >>> vc.detailed(["RID", "Name", "Status", "Description", "Created"])  # doctest: +SKIP
            >>> vc.entry(["Name", "Status", "Description"])  # doctest: +SKIP
            >>> table.annotations[VisibleColumns.tag] = vc.to_dict()  # doctest: +SKIP
            >>> ml.apply_annotations()  # doctest: +SKIP

        Method chaining::

            >>> vc = (VisibleColumns()  # doctest: +SKIP
            ...     .compact(["RID", "Name"])
            ...     .detailed(["RID", "Name", "Description"])
            ...     .entry(["Name", "Description"]))

        Including foreign key references::

            >>> vc = VisibleColumns()  # doctest: +SKIP
            >>> vc.compact([  # doctest: +SKIP
            ...     "RID",
            ...     "Name",
            ...     fk_constraint("domain", "Subject_Species_fkey"),
            ... ])

        With pseudo-columns for computed values::

            >>> vc = VisibleColumns()  # doctest: +SKIP
            >>> vc.compact([  # doctest: +SKIP
            ...     "RID",
            ...     "Name",
            ...     PseudoColumn(
            ...         source=[InboundFK("domain", "Sample_Subject_fkey"), "RID"],
            ...         aggregate=Aggregate.CNT,
            ...         markdown_name="Samples"
            ...     ),
            ... ])

        Context inheritance (reference another context)::

            >>> vc = VisibleColumns()  # doctest: +SKIP
            >>> vc.compact(["RID", "Name"])  # doctest: +SKIP
            >>> vc.set_context("compact/brief", "compact")  # Inherit from compact  # doctest: +SKIP

        With faceted search (filter context)::

            >>> vc = VisibleColumns()  # doctest: +SKIP
            >>> vc.compact(["RID", "Name", "Status"])  # doctest: +SKIP
            >>> facets = FacetList()  # doctest: +SKIP
            >>> facets.add(Facet(source="Status", open=True))  # doctest: +SKIP
            >>> vc._contexts["filter"] = facets.to_dict()  # doctest: +SKIP
    """

    tag = TAG_VISIBLE_COLUMNS

    _contexts: dict[str, list[ColumnEntry] | str] = field(default_factory=dict)

    def set_context(self, context: str, columns: list[ColumnEntry] | str) -> "VisibleColumns":
        """Set columns for a context.

        Args:
            context: Context name (e.g., "compact", "detailed", "*")
            columns: List of columns, or string referencing another context

        Returns:
            Self for chaining
        """
        self._contexts[context] = columns
        return self

    def compact(self, columns: list[ColumnEntry]) -> "VisibleColumns":
        """Set columns for compact (list) view."""
        return self.set_context(CONTEXT_COMPACT, columns)

    def detailed(self, columns: list[ColumnEntry]) -> "VisibleColumns":
        """Set columns for detailed (record) view."""
        return self.set_context(CONTEXT_DETAILED, columns)

    def entry(self, columns: list[ColumnEntry]) -> "VisibleColumns":
        """Set columns for entry (create/edit) forms."""
        return self.set_context(CONTEXT_ENTRY, columns)

    def entry_create(self, columns: list[ColumnEntry]) -> "VisibleColumns":
        """Set columns for create form only."""
        return self.set_context(CONTEXT_ENTRY_CREATE, columns)

    def entry_edit(self, columns: list[ColumnEntry]) -> "VisibleColumns":
        """Set columns for edit form only."""
        return self.set_context(CONTEXT_ENTRY_EDIT, columns)

    def default(self, columns: list[ColumnEntry]) -> "VisibleColumns":
        """Set default columns for all contexts."""
        return self.set_context(CONTEXT_DEFAULT, columns)

    def to_dict(self) -> dict[str, Any]:
        result = {}
        for context, columns in self._contexts.items():
            if isinstance(columns, str):
                result[context] = columns
            else:
                result[context] = [c.to_dict() if isinstance(c, PseudoColumn) else c for c in columns]
        return result

compact

compact(
    columns: list[ColumnEntry],
) -> "VisibleColumns"

Set columns for compact (list) view.

Source code in src/deriva_ml/model/annotations.py
846
847
848
def compact(self, columns: list[ColumnEntry]) -> "VisibleColumns":
    """Set columns for compact (list) view."""
    return self.set_context(CONTEXT_COMPACT, columns)

default

default(
    columns: list[ColumnEntry],
) -> "VisibleColumns"

Set default columns for all contexts.

Source code in src/deriva_ml/model/annotations.py
866
867
868
def default(self, columns: list[ColumnEntry]) -> "VisibleColumns":
    """Set default columns for all contexts."""
    return self.set_context(CONTEXT_DEFAULT, columns)

detailed

detailed(
    columns: list[ColumnEntry],
) -> "VisibleColumns"

Set columns for detailed (record) view.

Source code in src/deriva_ml/model/annotations.py
850
851
852
def detailed(self, columns: list[ColumnEntry]) -> "VisibleColumns":
    """Set columns for detailed (record) view."""
    return self.set_context(CONTEXT_DETAILED, columns)

entry

entry(
    columns: list[ColumnEntry],
) -> "VisibleColumns"

Set columns for entry (create/edit) forms.

Source code in src/deriva_ml/model/annotations.py
854
855
856
def entry(self, columns: list[ColumnEntry]) -> "VisibleColumns":
    """Set columns for entry (create/edit) forms."""
    return self.set_context(CONTEXT_ENTRY, columns)

entry_create

entry_create(
    columns: list[ColumnEntry],
) -> "VisibleColumns"

Set columns for create form only.

Source code in src/deriva_ml/model/annotations.py
858
859
860
def entry_create(self, columns: list[ColumnEntry]) -> "VisibleColumns":
    """Set columns for create form only."""
    return self.set_context(CONTEXT_ENTRY_CREATE, columns)

entry_edit

entry_edit(
    columns: list[ColumnEntry],
) -> "VisibleColumns"

Set columns for edit form only.

Source code in src/deriva_ml/model/annotations.py
862
863
864
def entry_edit(self, columns: list[ColumnEntry]) -> "VisibleColumns":
    """Set columns for edit form only."""
    return self.set_context(CONTEXT_ENTRY_EDIT, columns)

set_context

set_context(
    context: str,
    columns: list[ColumnEntry] | str,
) -> "VisibleColumns"

Set columns for a context.

Parameters:

Name Type Description Default
context str

Context name (e.g., "compact", "detailed", "*")

required
columns list[ColumnEntry] | str

List of columns, or string referencing another context

required

Returns:

Type Description
'VisibleColumns'

Self for chaining

Source code in src/deriva_ml/model/annotations.py
833
834
835
836
837
838
839
840
841
842
843
844
def set_context(self, context: str, columns: list[ColumnEntry] | str) -> "VisibleColumns":
    """Set columns for a context.

    Args:
        context: Context name (e.g., "compact", "detailed", "*")
        columns: List of columns, or string referencing another context

    Returns:
        Self for chaining
    """
    self._contexts[context] = columns
    return self

VisibleForeignKeys dataclass

Bases: AnnotationBuilder

Visible-foreign-keys annotation builder.

Controls which related tables appear in the UI via inbound foreign keys.

Example

vfk = VisibleForeignKeys() # doctest: +SKIP vfk.detailed([ # doctest: +SKIP ... fk_constraint("domain", "Image_Subject_fkey"), ... fk_constraint("domain", "Diagnosis_Subject_fkey") ... ])

Source code in src/deriva_ml/model/annotations.py
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
@dataclass
class VisibleForeignKeys(AnnotationBuilder):
    """Visible-foreign-keys annotation builder.

    Controls which related tables appear in the UI via inbound foreign keys.

    Example:
        >>> vfk = VisibleForeignKeys()  # doctest: +SKIP
        >>> vfk.detailed([  # doctest: +SKIP
        ...     fk_constraint("domain", "Image_Subject_fkey"),
        ...     fk_constraint("domain", "Diagnosis_Subject_fkey")
        ... ])
    """

    tag = TAG_VISIBLE_FOREIGN_KEYS

    _contexts: dict[str, list[ForeignKeyEntry] | str] = field(default_factory=dict)

    def set_context(self, context: str, foreign_keys: list[ForeignKeyEntry] | str) -> "VisibleForeignKeys":
        """Set foreign keys for a context."""
        self._contexts[context] = foreign_keys
        return self

    def detailed(self, foreign_keys: list[ForeignKeyEntry]) -> "VisibleForeignKeys":
        """Set foreign keys for detailed view."""
        return self.set_context(CONTEXT_DETAILED, foreign_keys)

    def default(self, foreign_keys: list[ForeignKeyEntry]) -> "VisibleForeignKeys":
        """Set default foreign keys for all contexts."""
        return self.set_context(CONTEXT_DEFAULT, foreign_keys)

    def to_dict(self) -> dict[str, Any]:
        result = {}
        for context, fkeys in self._contexts.items():
            if isinstance(fkeys, str):
                result[context] = fkeys
            else:
                result[context] = [fk.to_dict() if isinstance(fk, PseudoColumn) else fk for fk in fkeys]
        return result

default

default(
    foreign_keys: list[ForeignKeyEntry],
) -> "VisibleForeignKeys"

Set default foreign keys for all contexts.

Source code in src/deriva_ml/model/annotations.py
915
916
917
def default(self, foreign_keys: list[ForeignKeyEntry]) -> "VisibleForeignKeys":
    """Set default foreign keys for all contexts."""
    return self.set_context(CONTEXT_DEFAULT, foreign_keys)

detailed

detailed(
    foreign_keys: list[ForeignKeyEntry],
) -> "VisibleForeignKeys"

Set foreign keys for detailed view.

Source code in src/deriva_ml/model/annotations.py
911
912
913
def detailed(self, foreign_keys: list[ForeignKeyEntry]) -> "VisibleForeignKeys":
    """Set foreign keys for detailed view."""
    return self.set_context(CONTEXT_DETAILED, foreign_keys)

set_context

set_context(
    context: str,
    foreign_keys: list[ForeignKeyEntry]
    | str,
) -> "VisibleForeignKeys"

Set foreign keys for a context.

Source code in src/deriva_ml/model/annotations.py
906
907
908
909
def set_context(self, context: str, foreign_keys: list[ForeignKeyEntry] | str) -> "VisibleForeignKeys":
    """Set foreign keys for a context."""
    self._contexts[context] = foreign_keys
    return self

__getattr__

__getattr__(name: str)

Lazy import for DatabaseModel and DerivaMLBagView.

Source code in src/deriva_ml/model/__init__.py
 96
 97
 98
 99
100
101
102
103
104
105
106
def __getattr__(name: str):
    """Lazy import for DatabaseModel and DerivaMLBagView."""
    if name == "DatabaseModel":
        from deriva_ml.model.database import DatabaseModel

        return DatabaseModel
    if name == "DerivaMLBagView":
        from deriva_ml.model.deriva_ml_bag_view import DerivaMLBagView

        return DerivaMLBagView
    raise AttributeError(f"module {__name__!r} has no attribute {name!r}")

fk_constraint

fk_constraint(
    schema: str, constraint: str
) -> list[str]

Create a foreign key constraint reference for visible-columns.

Use this in visible-columns to include a foreign key column (showing the referenced row's name/link). This is different from InboundFK/OutboundFK which are used inside PseudoColumn source paths.

Parameters:

Name Type Description Default
schema str

Schema name containing the FK constraint

required
constraint str

Foreign key constraint name

required

Returns:

Type Description
list[str]

[schema, constraint] list for use in visible-columns

Example

Include a foreign key in visible columns::

>>> vc = VisibleColumns()  # doctest: +SKIP
>>> vc.compact([  # doctest: +SKIP
...     "RID",
...     "Name",
...     fk_constraint("domain", "Subject_Species_fkey"),  # Shows Species
... ])

This is equivalent to the raw format::

>>> vc.compact(["RID", "Name", ["domain", "Subject_Species_fkey"]])  # doctest: +SKIP
Source code in src/deriva_ml/model/annotations.py
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
def fk_constraint(schema: str, constraint: str) -> list[str]:
    """Create a foreign key constraint reference for visible-columns.

    Use this in visible-columns to include a foreign key column (showing the
    referenced row's name/link). This is different from InboundFK/OutboundFK
    which are used inside PseudoColumn source paths.

    Args:
        schema: Schema name containing the FK constraint
        constraint: Foreign key constraint name

    Returns:
        [schema, constraint] list for use in visible-columns

    Example:
        Include a foreign key in visible columns::

            >>> vc = VisibleColumns()  # doctest: +SKIP
            >>> vc.compact([  # doctest: +SKIP
            ...     "RID",
            ...     "Name",
            ...     fk_constraint("domain", "Subject_Species_fkey"),  # Shows Species
            ... ])

        This is equivalent to the raw format::

            >>> vc.compact(["RID", "Name", ["domain", "Subject_Species_fkey"]])  # doctest: +SKIP
    """
    return [schema, constraint]