DerivaModel

The DerivaModel class provides schema introspection and manipulation capabilities for Deriva catalogs. It handles table relationships, associations, and catalog structure management.

Model module for DerivaML.

This module provides catalog and database model classes, plus annotation builders. Schema/data infrastructure that used to live here (SchemaBuilder, DataLoader, DataSource, etc.) now lives upstream in :mod:deriva.bag; import from there directly.

Key components: - DerivaModel: Schema analysis utilities - DatabaseModel: SQLite database from BDBag - DerivaMLBagView: deriva-ml-domain view over a DatabaseModel

Lazy imports are used for DatabaseModel and DerivaMLBagView to avoid circular imports with the dataset module.

Aggregate

Bases: str, Enum

Aggregation functions for pseudo-columns.

Used when a pseudo-column follows an inbound foreign key and returns multiple values that need to be aggregated.

Attributes:

Name	Type	Description
`MIN`		Minimum value
`MAX`		Maximum value
`CNT`		Count of values
`CNT_D`		Count of distinct values
`ARRAY`		Array of all values
`ARRAY_D`		Array of distinct values

Example

Count related records

pc = PseudoColumn( # doctest: +SKIP ... source=[InboundFK("domain", "Sample_Subject_fkey"), "RID"], ... aggregate=Aggregate.CNT, ... markdown_name="Sample Count" ... )

Get distinct values as array

pc = PseudoColumn( # doctest: +SKIP ... source=[InboundFK("domain", "Tag_Item_fkey"), "Name"], ... aggregate=Aggregate.ARRAY_D, ... markdown_name="Tags" ... )

Source code in src/deriva_ml/model/annotations.py

class Aggregate(str, Enum):
    """Aggregation functions for pseudo-columns.

    Used when a pseudo-column follows an inbound foreign key and returns
    multiple values that need to be aggregated.

    Attributes:
        MIN: Minimum value
        MAX: Maximum value
        CNT: Count of values
        CNT_D: Count of distinct values
        ARRAY: Array of all values
        ARRAY_D: Array of distinct values

    Example:
        >>> # Count related records
        >>> pc = PseudoColumn(  # doctest: +SKIP
        ...     source=[InboundFK("domain", "Sample_Subject_fkey"), "RID"],
        ...     aggregate=Aggregate.CNT,
        ...     markdown_name="Sample Count"
        ... )
        >>>
        >>> # Get distinct values as array
        >>> pc = PseudoColumn(  # doctest: +SKIP
        ...     source=[InboundFK("domain", "Tag_Item_fkey"), "Name"],
        ...     aggregate=Aggregate.ARRAY_D,
        ...     markdown_name="Tags"
        ... )
    """

    MIN = "min"
    MAX = "max"
    CNT = "cnt"
    CNT_D = "cnt_d"
    ARRAY = "array"
    ARRAY_D = "array_d"

ArrayUxMode

Bases: str, Enum

Display modes for array values in pseudo-columns.

Controls how arrays of values are rendered in the UI.

Attributes:

Name	Type	Description
`RAW`		Raw array display
`CSV`		Comma-separated values
`OLIST`		Ordered (numbered) list
`ULIST`		Unordered (bulleted) list

Example

pc = PseudoColumn( # doctest: +SKIP ... source=[InboundFK("domain", "Tag_Item_fkey"), "Name"], ... aggregate=Aggregate.ARRAY, ... display=PseudoColumnDisplay(array_ux_mode=ArrayUxMode.CSV) ... )

Source code in src/deriva_ml/model/annotations.py

class ArrayUxMode(str, Enum):
    """Display modes for array values in pseudo-columns.

    Controls how arrays of values are rendered in the UI.

    Attributes:
        RAW: Raw array display
        CSV: Comma-separated values
        OLIST: Ordered (numbered) list
        ULIST: Unordered (bulleted) list

    Example:
        >>> pc = PseudoColumn(  # doctest: +SKIP
        ...     source=[InboundFK("domain", "Tag_Item_fkey"), "Name"],
        ...     aggregate=Aggregate.ARRAY,
        ...     display=PseudoColumnDisplay(array_ux_mode=ArrayUxMode.CSV)
        ... )
    """

    RAW = "raw"
    CSV = "csv"
    OLIST = "olist"
    ULIST = "ulist"

ColumnDisplay `dataclass`

Bases: AnnotationBuilder

Column-display annotation builder.

Controls how column values are rendered.

Example

cd = ColumnDisplay() # doctest: +SKIP cd.default(ColumnDisplayOptions( # doctest: +SKIP ... pre_format=PreFormat(format="%.2f") ... ))

Markdown link

cd = ColumnDisplay() # doctest: +SKIP cd.default(ColumnDisplayOptions( # doctest: +SKIP ... markdown_pattern="Link" ... ))

Source code in src/deriva_ml/model/annotations.py

@dataclass
class ColumnDisplay(AnnotationBuilder):
    """Column-display annotation builder.

    Controls how column values are rendered.

    Example:
        >>> cd = ColumnDisplay()  # doctest: +SKIP
        >>> cd.default(ColumnDisplayOptions(  # doctest: +SKIP
        ...     pre_format=PreFormat(format="%.2f")
        ... ))
        >>>
        >>> # Markdown link
        >>> cd = ColumnDisplay()  # doctest: +SKIP
        >>> cd.default(ColumnDisplayOptions(  # doctest: +SKIP
        ...     markdown_pattern="[Link]({{{_value}}})"
        ... ))
    """

    tag = TAG_COLUMN_DISPLAY

    _contexts: dict[str, ColumnDisplayOptions | str] = field(default_factory=dict)

    def set_context(self, context: str, options: ColumnDisplayOptions | str) -> "ColumnDisplay":
        """Set options for a context."""
        self._contexts[context] = options
        return self

    def default(self, options: ColumnDisplayOptions) -> "ColumnDisplay":
        """Set default options."""
        return self.set_context(CONTEXT_DEFAULT, options)

    def compact(self, options: ColumnDisplayOptions) -> "ColumnDisplay":
        """Set options for compact view."""
        return self.set_context(CONTEXT_COMPACT, options)

    def detailed(self, options: ColumnDisplayOptions) -> "ColumnDisplay":
        """Set options for detailed view."""
        return self.set_context(CONTEXT_DETAILED, options)

    def to_dict(self) -> dict[str, Any]:
        result = {}
        for context, options in self._contexts.items():
            if isinstance(options, str):
                result[context] = options
            else:
                result[context] = options.to_dict()
        return result

compact

compact(
    options: ColumnDisplayOptions,
) -> "ColumnDisplay"

Set options for compact view.

Source code in src/deriva_ml/model/annotations.py

def compact(self, options: ColumnDisplayOptions) -> "ColumnDisplay":
    """Set options for compact view."""
    return self.set_context(CONTEXT_COMPACT, options)

default

default(
    options: ColumnDisplayOptions,
) -> "ColumnDisplay"

Set default options.

Source code in src/deriva_ml/model/annotations.py

def default(self, options: ColumnDisplayOptions) -> "ColumnDisplay":
    """Set default options."""
    return self.set_context(CONTEXT_DEFAULT, options)

detailed

detailed(
    options: ColumnDisplayOptions,
) -> "ColumnDisplay"

Set options for detailed view.

Source code in src/deriva_ml/model/annotations.py

def detailed(self, options: ColumnDisplayOptions) -> "ColumnDisplay":
    """Set options for detailed view."""
    return self.set_context(CONTEXT_DETAILED, options)

set_context

set_context(
    context: str,
    options: ColumnDisplayOptions | str,
) -> "ColumnDisplay"

Set options for a context.

Source code in src/deriva_ml/model/annotations.py

def set_context(self, context: str, options: ColumnDisplayOptions | str) -> "ColumnDisplay":
    """Set options for a context."""
    self._contexts[context] = options
    return self

ColumnDisplayOptions `dataclass`

Options for displaying a column in a specific context.

Parameters:

Name	Type	Description	Default
`pre_format`	`PreFormat \| None`	Pre-formatting options	`None`
`markdown_pattern`	`str \| None`	Template for rendering	`None`
`template_engine`	`TemplateEngine \| None`	Template engine to use	`None`
`column_order`	`list[SortKey] \| Literal[False] \| None`	Sort order, or False to disable	`None`

Source code in src/deriva_ml/model/annotations.py

@dataclass
class ColumnDisplayOptions:
    """Options for displaying a column in a specific context.

    Args:
        pre_format: Pre-formatting options
        markdown_pattern: Template for rendering
        template_engine: Template engine to use
        column_order: Sort order, or False to disable
    """

    pre_format: PreFormat | None = None
    markdown_pattern: str | None = None
    template_engine: TemplateEngine | None = None
    column_order: list[SortKey] | Literal[False] | None = None

    def to_dict(self) -> dict[str, Any]:
        result = {}
        if self.pre_format is not None:
            result["pre_format"] = self.pre_format.to_dict()
        if self.markdown_pattern is not None:
            result["markdown_pattern"] = self.markdown_pattern
        if self.template_engine is not None:
            result["template_engine"] = self.template_engine.value
        if self.column_order is not None:
            if self.column_order is False:
                result["column_order"] = False
            else:
                result["column_order"] = [k.to_dict() if isinstance(k, SortKey) else k for k in self.column_order]
        return result

DerivaModel

Augmented interface to deriva model class.

This class provides a number of DerivaML specific methods that augment the interface in the deriva model class.

Attributes:

Name	Type	Description
`model`		ERMRest model for the catalog.
`catalog`	`ErmrestCatalog`	ERMRest catalog for the model.
`hostname`		Hostname of the ERMRest server.
`ml_schema`		The ML schema name for the catalog.
`domain_schemas`		Frozenset of all domain schema names in the catalog.
`default_schema`		The default schema for table creation operations.

Source code in src/deriva_ml/model/catalog.py

class DerivaModel:
    """Augmented interface to deriva model class.

    This class provides a number of DerivaML specific methods that augment the interface in the deriva model class.

    Attributes:
        model: ERMRest model for the catalog.
        catalog: ERMRest catalog for the model.
        hostname: Hostname of the ERMRest server.
        ml_schema: The ML schema name for the catalog.
        domain_schemas: Frozenset of all domain schema names in the catalog.
        default_schema: The default schema for table creation operations.

    """

    def __init__(
        self,
        model: Model,
        ml_schema: str = ML_SCHEMA,
        domain_schemas: str | set[str] | None = None,
        default_schema: str | None = None,
    ):
        """Create and initialize a DerivaModel instance.

        This method will connect to a catalog and initialize schema configuration.
        This class is intended to be used as a base class on which domain-specific interfaces are built.

        Args:
            model: The ERMRest model for the catalog.
            ml_schema: The ML schema name.
            domain_schemas: Optional explicit set of domain schema names. If None,
                auto-detects all non-system schemas.
            default_schema: The default schema for table creation operations. If None
                and there is exactly one domain schema, that schema is used as default.
                If there are multiple domain schemas, default_schema must be specified.
        """
        self.model = model
        self.catalog: ErmrestCatalog = self.model.catalog
        self.hostname = self.catalog.deriva_server.server if isinstance(self.catalog, ErmrestCatalog) else "localhost"

        self.ml_schema = ml_schema

        # Determine domain schemas
        if domain_schemas is not None:
            if isinstance(domain_schemas, str):
                domain_schemas = {domain_schemas}
            self.domain_schemas = frozenset(domain_schemas)
        else:
            # Auto-detect all domain schemas
            self.domain_schemas = _get_domain_schemas(self.model.schemas.keys(), ml_schema)

        # Determine default schema for table creation
        if default_schema is not None:
            if default_schema not in self.domain_schemas:
                raise DerivaMLException(
                    f"default_schema '{default_schema}' is not in domain_schemas: {self.domain_schemas}"
                )
            self.default_schema = default_schema
        elif len(self.domain_schemas) == 1:
            # Single domain schema - use it as default
            self.default_schema = next(iter(self.domain_schemas))
        elif len(self.domain_schemas) == 0:
            # No domain schemas - default_schema will be None
            self.default_schema = None
        else:
            # Multiple domain schemas, no explicit default
            self.default_schema = None

    @classmethod
    def from_cached(
        cls,
        schema_dict: dict,
        *,
        catalog,
        ml_schema: str = ML_SCHEMA,
        domain_schemas: "str | set[str] | None" = None,
        default_schema: "str | None" = None,
    ) -> "DerivaModel":
        """Construct a DerivaModel from a cached ermrest /schema dict.

        No network is touched. The ``catalog`` argument is passed to
        deriva-py's ``Model(catalog, model_doc)`` constructor as the
        first positional argument; in offline mode it will be a
        :class:`~deriva_ml.core.catalog_stub.CatalogStub`, in online
        mode it is a real ``ErmrestCatalog``. ``DerivaModel.__init__``
        then reads the catalog back off ``model.catalog`` as usual.

        This replicates what ``Model.fromcatalog(catalog)`` does
        online — the online call fetches the schema dict via
        ``catalog.getCatalogSchema()`` (cached and ETag-revalidated
        by deriva-py) and passes the result to ``Model(catalog, dict)``.
        Here we pass in the already-cached dict from
        :class:`~deriva_ml.core.schema_cache.SchemaCache`.

        Args:
            schema_dict: The JSON payload from a previous
                ``catalog.getCatalogSchema()`` call (or any equivalent
                ``/schema`` GET), as persisted by ``SchemaCache``.
            catalog: The catalog object to associate with the model.
                Pass a real ``ErmrestCatalog`` online, or a
                ``CatalogStub`` offline.
            ml_schema: ML schema name (default ``"deriva-ml"``).
            domain_schemas: Optional explicit set of domain schema
                names. If None, auto-detects all non-system schemas
                from the cached dict.
            default_schema: Optional default schema name.

        Returns:
            A ``DerivaModel`` wrapping a deriva-py ``Model``
            reconstructed from the dict.

        Example:
            >>> cached = schema_cache.load(hostname, catalog_id)  # doctest: +SKIP
            >>> model = DerivaModel.from_cached(  # doctest: +SKIP
            ...     cached, catalog=catalog_stub, ml_schema="deriva-ml"
            ... )
        """
        # Model.__init__(catalog, model_doc) stores catalog as
        # self._catalog and exposes it via the .catalog property;
        # DerivaModel.__init__ then reads self.model.catalog.
        model = Model(catalog, schema_dict)
        return cls(
            model,
            ml_schema=ml_schema,
            domain_schemas=domain_schemas,
            default_schema=default_schema,
        )

    def is_system_schema(self, schema_name: str) -> bool:
        """Check if a schema is a system or ML schema.

        Args:
            schema_name: Name of the schema to check.

        Returns:
            True if the schema is a system or ML schema.

        Example:
            >>> model.is_system_schema("public")  # doctest: +SKIP
            True
            >>> model.is_system_schema("my_domain")  # doctest: +SKIP
            False
        """
        return _is_system_schema(schema_name, self.ml_schema)

    def is_domain_schema(self, schema_name: str) -> bool:
        """Check if a schema is a domain schema.

        Args:
            schema_name: Name of the schema to check.

        Returns:
            True if the schema is a domain schema.

        Example:
            >>> model.is_domain_schema("my_domain")  # doctest: +SKIP
            True
            >>> model.is_domain_schema("deriva-ml")  # doctest: +SKIP
            False
        """
        return schema_name in self.domain_schemas

    def _require_default_schema(self) -> str:
        """Get default schema, raising an error if not set.

        Returns:
            The default schema name.

        Raises:
            DerivaMLException: If default_schema is not set.
        """
        if self.default_schema is None:
            raise DerivaMLException(
                f"No default_schema set. With multiple domain schemas {self.domain_schemas}, "
                "you must either specify a default_schema when creating DerivaML or "
                "pass an explicit schema parameter to this method."
            )
        return self.default_schema

    def refresh_model(self) -> None:
        """Re-fetch the catalog model and replace ``self.model`` in place.

        Calls ``catalog.getCatalogModel()`` and rebinds the result to
        ``self.model``. Use this after a schema change (new table, column,
        or annotation) so subsequent introspection sees the current model.

        Caching note: the asset-execution-table cache
        (``_asset_execution_tables_cache``) is keyed on the *identity* of
        ``self.model``, so swapping the model out automatically invalidates it
        — the next call recomputes. The denormalize-planner cache
        (``_planner_cache``), if already built, keeps a reference to the
        previous model; if you depend on the planner reflecting a just-applied
        schema change, rebuild the instance rather than relying on
        ``refresh_model`` alone.

        Returns:
            None. Mutates ``self.model`` as a side effect.

        Example:
            >>> ml.create_vocabulary("Severity", "Lesion grade")  # doctest: +SKIP
            >>> ml.refresh_model()  # pick up the new table  # doctest: +SKIP
        """
        self.model = self.catalog.getCatalogModel()

    @property
    def chaise_config(self) -> dict[str, Any]:
        """Return the chaise configuration.

        Returns:
            The catalog-level Chaise display configuration annotation as a dict.

        Example:
            >>> cfg = model.chaise_config  # doctest: +SKIP
            >>> "navbarBrandText" in cfg  # doctest: +SKIP
            True
        """
        return self.model.chaise_config

    def get_schema_description(self, include_system_columns: bool = False) -> dict[str, Any]:
        """Return a JSON description of the catalog schema structure.

        Provides a structured representation of the domain and ML schemas including
        tables, columns, foreign keys, and relationships. Useful for understanding
        the data model structure programmatically.

        Args:
            include_system_columns: If True, include RID, RCT, RMT, RCB, RMB columns.
                Default False to reduce output size.

        Returns:
            Dictionary with schema structure:
            {
                "domain_schemas": ["schema_name1", "schema_name2"],
                "default_schema": "schema_name1",
                "ml_schema": "deriva-ml",
                "schemas": {
                    "schema_name": {
                        "tables": {
                            "TableName": {
                                "comment": "description",
                                "is_vocabulary": bool,
                                "is_asset": bool,
                                "is_association": bool,
                                "columns": [...],
                                "foreign_keys": [...],
                                "features": [...]
                            }
                        }
                    }
                }
            }

        Example:
            >>> desc = model.get_schema_description()  # doctest: +SKIP
            >>> sorted(desc["schemas"])  # doctest: +SKIP
            ['deriva-ml', 'my_domain']
            >>> desc["schemas"]["my_domain"]["tables"]["Image"]["is_asset"]  # doctest: +SKIP
            True
        """
        system_columns = {"RID", "RCT", "RMT", "RCB", "RMB"}
        result = {
            "domain_schemas": sorted(self.domain_schemas),
            "default_schema": self.default_schema,
            "ml_schema": self.ml_schema,
            "schemas": {},
        }

        # Include all domain schemas and the ML schema
        for schema_name in [*self.domain_schemas, self.ml_schema]:
            schema = self.model.schemas.get(schema_name)
            if not schema:
                continue

            schema_info = {"tables": {}}

            for table_name, table in schema.tables.items():
                # Get columns
                columns = []
                for col in table.columns:
                    if not include_system_columns and col.name in system_columns:
                        continue
                    columns.append(
                        {
                            "name": col.name,
                            "type": str(col.type.typename),
                            "nullok": col.nullok,
                            "comment": col.comment or "",
                        }
                    )

                # Get foreign keys
                foreign_keys = []
                for fk in table.foreign_keys:
                    fk_cols = [c.name for c in fk.foreign_key_columns]
                    ref_cols = [c.name for c in fk.referenced_columns]
                    foreign_keys.append(
                        {
                            "columns": fk_cols,
                            "referenced_table": f"{fk.pk_table.schema.name}.{fk.pk_table.name}",
                            "referenced_columns": ref_cols,
                        }
                    )

                # Get features if this is a domain table
                features = []
                if self.is_domain_schema(schema_name):
                    try:
                        for f in self.find_features(table):
                            features.append(
                                {
                                    "name": f.feature_name,
                                    "feature_table": f.feature_table.name,
                                }
                            )
                    except Exception as e:
                        logger.debug(f"Could not enumerate features for table {table.name}: {e}")

                table_info = {
                    "comment": table.comment or "",
                    "is_vocabulary": self.is_vocabulary(table),
                    "is_asset": self.is_asset(table),
                    "is_association": bool(self.is_association(table)),
                    "columns": columns,
                    "foreign_keys": foreign_keys,
                }
                if features:
                    table_info["features"] = features

                schema_info["tables"][table_name] = table_info

            result["schemas"][schema_name] = schema_info

        return result

    def __getattr__(self, name: str) -> Any:
        """Delegate unknown attribute access to the underlying deriva-py Model.

        Called only when ``name`` is not already an attribute of the
        ``DerivaModel`` instance (per Python's attribute resolution order),
        so explicit properties on this class — ``chaise_config``,
        ``apply``, ``catalog``, ``schemas`` (inherited via :class:`DatabaseModel`
        from :class:`deriva.bag.database.BagDatabase`) — take precedence.

        Kept as a fallback because ``self.model.<attr>`` is reached at 50+
        call sites for ``schemas``, ``annotations`` and a long tail of
        deriva-py Model attributes. Replacing each with explicit
        accessors would collide with mixins (e.g. ``BagDatabase.schemas``
        is an instance-attribute set in its ``__init__``, which a
        ``@property`` would shadow and block assignment to).
        """
        return getattr(self.model, name)

    def name_to_table(self, table: TableInput) -> Table:
        """Return the table object corresponding to the given table name.

        Searches domain schemas first (in sorted order), then ML schema, then WWW.
        If the table name appears in more than one schema, returns the first match.

        Args:
          table: A ERMRest table object or a string that is the name of the table.

        Returns:
          Table object.

        Raises:
          DerivaMLTableNotFound: If the table doesn't exist in any searchable schema.

        Example:
            >>> image = model.name_to_table("Image")  # doctest: +SKIP
            >>> image.name  # doctest: +SKIP
            'Image'
        """
        if isinstance(table, Table):
            return table

        # Search domain schemas (sorted for deterministic order), then ML schema, then WWW
        search_order = [*sorted(self.domain_schemas), self.ml_schema, "WWW"]
        for sname in search_order:
            if sname not in self.model.schemas:
                continue
            s = self.model.schemas[sname]
            if table in s.tables:
                return s.tables[table]
        raise DerivaMLTableNotFound(str(table), msg="Table doesn't exist in any searchable schema")

    def is_vocabulary(self, table: TableInput) -> bool:
        """Check if a given table is a controlled vocabulary table.

        Delegates to ``Table.is_vocabulary()`` in deriva-py, which enforces both
        the required column names AND their types (ermrest_curie, ermrest_uri,
        text, markdown). The type check is stricter than a column-name-only
        check — a table with an ``ID`` column of the wrong type correctly
        returns False here where the legacy name-only implementation would
        have returned True.

        Mirrors :meth:`is_asset`, which already delegates to ``Table.is_asset()``.

        Args:
            table: An ERMrest Table object or the name of the table.

        Returns:
            True if the table has the structure of a controlled vocabulary,
            False otherwise.

        Raises:
            DerivaMLTableNotFound: If the table doesn't exist in any searchable
                schema (raised by :meth:`name_to_table`).

        Example:
            >>> model.is_vocabulary("Image_Class")  # doctest: +SKIP
            True
            >>> model.is_vocabulary("Image")  # doctest: +SKIP
            False
        """
        table = self.name_to_table(table)
        return table.is_vocabulary()

    def vocab_columns(self, table: TableInput) -> dict[str, str]:
        """Return mapping from canonical vocab column name to actual column name.

        Canonical names are TitleCase (Name, ID, URI, Description, Synonyms).
        Actual names reflect the table's schema — could be lowercase for
        FaceBase-style catalogs or TitleCase for DerivaML-native tables.

        Args:
            table: A table object or the name of the table.

        Returns:
            Dict mapping canonical name to actual column name in the table.
            E.g. ``{"Name": "name", "ID": "id", ...}`` for FaceBase tables
            or ``{"Name": "Name", "ID": "ID", ...}`` for DerivaML tables.

        Raises:
            DerivaMLTableNotFound: If the table doesn't exist (raised by
                :meth:`name_to_table`).

        Example:
            >>> model.vocab_columns("Image_Class")  # doctest: +SKIP
            {'Name': 'Name', 'ID': 'ID', 'URI': 'URI', 'Description': 'Description', 'Synonyms': 'Synonyms'}
        """
        table = self.name_to_table(table)
        col_map = {c.name.upper(): c.name for c in table.columns}
        return {canon: col_map[canon.upper()] for canon in ("Name", "ID", "URI", "Description", "Synonyms")}

    def is_association(
        self,
        table: TableInput,
        unqualified: bool = True,
        pure: bool = True,
        min_arity: int = 2,
        max_arity: int = 2,
    ) -> bool | set[str] | int:
        """Check whether ``table`` is an association (linking) table.

        Delegates to :meth:`deriva.core.ermrest_model.Table.is_association`.
        An association table mediates a many-to-many relationship between
        two (or more) tables via outbound FKs to each end.

        Args:
            table: Table name or :class:`Table` to inspect.
            unqualified: Per deriva-py — if True, the returned column set
                uses bare column names (no schema/table qualification).
                Only consulted when the return mode is the column-name set.
            pure: If True, require a *pure* association — no extra payload
                columns beyond the FK columns and system metadata (RID,
                RCT, RMT, RCB, RMB). Excludes feature tables, which carry
                their own non-FK columns.
            min_arity: Minimum number of outbound FKs that count as
                "associating." Defaults to 2 (a binary association).
            max_arity: Maximum number of outbound FKs. Defaults to 2.

        Returns:
            ``bool`` when the question is "is this *any* association at the
            requested arity," or ``set[str]`` / ``int`` when deriva-py's
            ``is_association`` returns the structural detail set instead.
            See :meth:`Table.is_association` for the full contract.

        Raises:
            DerivaMLTableNotFound: If ``table`` doesn't exist in any searchable
                schema (raised by :meth:`name_to_table`).

        Example:
            >>> bool(model.is_association("Dataset_Image"))  # doctest: +SKIP
            True
            >>> bool(model.is_association("Image"))  # doctest: +SKIP
            False
        """
        table = self.name_to_table(table)
        return table.is_association(unqualified=unqualified, pure=pure, min_arity=min_arity, max_arity=max_arity)

    def find_association(self, table1: TableInput, table2: TableInput) -> tuple[Table, str, str]:
        """Return the unique association table linking ``table1`` and ``table2``.

        Searches all associations on ``table1`` for one whose other-side
        FK lands on ``table2``. The result lets callers JOIN through the
        link without re-deriving the column names by hand.

        Args:
            table1: Either endpoint of the association. Table name or
                :class:`Table`.
            table2: The other endpoint. Table name or :class:`Table`.

        Returns:
            ``(assoc_table, table1_link_column, table2_link_column)``
            — the association :class:`Table` itself plus the *names* (as
            ``str``) of the two FK columns on it (one referencing
            ``table1``, one referencing ``table2``). The column names are
            returned as strings because every caller uses them directly
            as ``datapath`` ``.columns[...]`` keys or insert-row dict keys.

        Raises:
            NoAssociationException: If no association table connects the
                two tables. Callers that legitimately handle the "no link"
                case (e.g. probing whether an asset table is tracked
                through ``Execution``) should catch this specific subclass
                rather than the broader :class:`DerivaMLException`.
            AmbiguousAssociationException: If multiple association tables
                connect the two tables. The caller must disambiguate by
                naming the desired association table directly.

        Example:
            >>> assoc, c1, c2 = model.find_association("Dataset", "Image")  # doctest: +SKIP
            >>> assoc.name, c1, c2  # doctest: +SKIP
            ('Dataset_Image', 'Dataset', 'Image')
        """
        table1 = self.name_to_table(table1)
        table2 = self.name_to_table(table2)

        tables = [
            (a.table, a.self_fkey.columns[0].name, other_key.columns[0].name)
            for a in table1.find_associations(pure=False)
            if len(a.other_fkeys) == 1 and (other_key := a.other_fkeys.pop()).pk_table == table2
        ]

        if len(tables) == 1:
            return tables[0]
        elif len(tables) == 0:
            raise NoAssociationException(table1.name, table2.name)
        else:
            raise AmbiguousAssociationException(table1.name, table2.name, len(tables))

    def is_asset(self, table: TableInput) -> bool:
        """Check whether ``table`` is a proper asset table.

        Delegates to :meth:`Table.is_asset` from deriva-py, which verifies:

        - Required columns exist (``URL``, ``Filename``, ``Length``, ``MD5``).
        - ``URL``, ``Length``, ``MD5`` are NOT NULL.
        - ``URL`` carries the ``asset`` annotation.

        Args:
            table: Table name or :class:`Table` to inspect.

        Returns:
            True if all asset-table requirements are satisfied.

        Raises:
            DerivaMLTableNotFound: If ``table`` doesn't exist in any searchable
                schema (raised by :meth:`name_to_table`).

        Example:
            >>> model.is_asset("Image")  # doctest: +SKIP
            True
            >>> model.is_asset("Subject")  # doctest: +SKIP
            False
        """
        table = self.name_to_table(table)
        return table.is_asset()

    def find_asset_execution_tables(self) -> list[tuple[str, str]]:
        """Return the ``*_Execution`` association tables across all schemas.

        Walks every domain + ML schema once, finds tables whose
        name ends with ``_Execution``, and caches the result on
        the instance. Subsequent calls re-use the cache so callers
        that walk these tables repeatedly (e.g.
        :func:`~deriva_ml.execution._helpers.list_assets`) pay
        the schema-iteration cost exactly once per
        :class:`DerivaModel` lifetime.

        Two ``*_Execution`` tables are **excluded** because they're
        not asset-to-execution association tables despite the
        suffix:

        - ``Dataset_Execution`` — dataset linkage; consumed by
          :func:`~deriva_ml.execution._helpers.list_input_datasets`.
        - ``Execution_Execution`` — nested-execution hierarchy
          (parent/child); has no ``Asset_Role`` column. Hitting
          it during the ``list_assets(asset_role=...)`` walk
          produces an ``AttributeError`` ("no such column
          ``Asset_Role``") rather than just returning zero
          matches, so the exclusion is correctness-critical.

        The cache is invalidated whenever the underlying model
        object identity changes (e.g. after a catalog
        ``Model.fromcatalog`` refetch). In practice the model is
        only refreshed in long-lived sessions that mutate schema
        — the common case (read-mostly scripts) hits the cache
        on every call after the first.

        Returns:
            List of ``(schema_name, table_name)`` pairs, ordered
            by schema-then-table for deterministic iteration.

        Example:
            >>> from deriva_ml.model.catalog import DerivaModel  # doctest: +SKIP
            >>> model.find_asset_execution_tables()  # doctest: +SKIP
            [('deriva-ml', 'Execution_Asset_Execution'),
             ('test_schema', 'Image_Execution')]
        """
        # Cache key is the underlying model object's identity —
        # a refetch swaps it out and we recompute. Cheap, safe.
        cached = getattr(self, "_asset_execution_tables_cache", None)
        if cached is not None and cached[0] is self.model:
            return cached[1]

        result: list[tuple[str, str]] = []
        schemas_to_search = [*sorted(self.domain_schemas), self.ml_schema]
        for schema_name in schemas_to_search:
            schema_obj = self.model.schemas.get(schema_name)
            if schema_obj is None:
                continue
            for table in schema_obj.tables.values():
                if not table.name.endswith("_Execution"):
                    continue
                # ``Dataset_Execution`` is the dataset linkage; not an
                # asset association. ``Execution_Execution`` is the
                # nested-execution parent/child table; has no
                # ``Asset_Role`` column so it'd crash any
                # ``list_assets(asset_role=...)`` walk.
                if table.name in ("Dataset_Execution", "Execution_Execution"):
                    continue
                result.append((schema_name, table.name))

        self._asset_execution_tables_cache = (self.model, result)
        return result

    def find_assets(self) -> list[Table]:
        """Return the list of asset tables in the current model.

        Returns:
            All tables across every schema that satisfy :meth:`is_asset`.

        Example:
            >>> [t.name for t in model.find_assets()]  # doctest: +SKIP
            ['Image', 'Execution_Asset']
        """
        return [t for s in self.model.schemas.values() for t in s.tables.values() if self.is_asset(t)]

    def find_vocabularies(self) -> list[Table]:
        """Return a list of all controlled vocabulary tables in domain and ML schemas.

        Returns:
            All tables in the domain and ML schemas that satisfy
            :meth:`is_vocabulary`.

        Example:
            >>> [t.name for t in model.find_vocabularies()]  # doctest: +SKIP
            ['Image_Class', 'Workflow_Type']
        """
        tables = []
        for schema_name in [*self.domain_schemas, self.ml_schema]:
            schema = self.model.schemas.get(schema_name)
            if schema:
                tables.extend(t for t in schema.tables.values() if self.is_vocabulary(t))
        return tables

    @validate_call(config=VALIDATION_CONFIG)
    def find_features(self, table: TableInput | None = None) -> Iterable[Feature]:
        """List features in the catalog.

        If a table is specified, returns only features for that table.
        If no table is specified, returns all features across all tables in the catalog.

        Args:
            table: Optional table to find features for. If None, returns all features
                in the catalog.

        Returns:
            An iterable of Feature instances describing the features.

        Example:
            >>> [f.feature_name for f in model.find_features("Image")]  # doctest: +SKIP
            ['BoundingBox', 'Quality']
            >>> all_features = list(model.find_features())  # doctest: +SKIP
        """

        def is_feature(a: FindAssociationResult) -> bool:
            """Check if association represents a feature.

            Args:
                a: Association result to check
            Returns:
                bool: True if association represents a feature
            """
            return {
                "Feature_Name",
                "Execution",
                a.self_fkey.foreign_key_columns[0].name,
            }.issubset({c.name for c in a.table.columns})

        def find_table_features(t: Table) -> list[Feature]:
            """Find all features for a single table.

            ``max_arity`` is left unbounded (``None``) so that
            *key-qualified* multi-value features are discovered. A
            qualifier is a value FK that participates in the
            association table's compound uniqueness key — e.g.
            ``Image_Side`` on eye-ai's ``Execution_Subject_Chart_Label``,
            where the same Subject legitimately has a left-eye and a
            right-eye row. Such a key includes
            ``{Execution, Subject, Feature_Name, Image_Side}``, giving a
            key-FK arity of 4. The former ``max_arity=3`` cap silently
            excluded these features from discovery (and therefore from
            ``lookup_feature`` / ``feature_values``).

            ``is_feature`` remains the sole filter: it still requires the
            ``Feature_Name`` and ``Execution`` FKs plus the target FK, so
            removing the ceiling cannot admit a non-feature association
            (a plain N-way domain join lacks ``Feature_Name``).
            ``min_arity=3`` is retained as the lower bound.
            """
            return [
                Feature(a, self) for a in t.find_associations(min_arity=3, max_arity=None, pure=False) if is_feature(a)
            ]

        if table is not None:
            # Find features for a specific table
            return find_table_features(self.name_to_table(table))

        # No table arg: discover features across the whole catalog.
        #
        # ``find_associations`` walks ``Table.referenced_by`` from each
        # candidate table, so the same association table is visited
        # once per FK target. For a single ``Image.Image_Classification``
        # feature backed by ``Execution_Image_Image_Classification``
        # (an association with FKs to Image, Execution, and the
        # Image_Class vocab) the naive cross-schema scan yields three
        # Feature objects -- one with ``target_table=Image`` (the
        # actual target), one with ``target_table=Execution``, and one
        # with ``target_table=Image_Class``. Only the first is what
        # callers want. See
        # docs/bugs/2026-05-19-find-features-duplicates.md.
        #
        # The fix is twofold:
        # 1. Skip iteration over tables that can never be the actual
        #    feature target -- the Execution table and any vocabulary
        #    table. Every feature association references both, so
        #    scanning them only produces duplicates.
        # 2. Dedup the remaining list by the association table itself
        #    (qualified schema.name), in case multiple distinct target
        #    tables share an association in some non-canonical layout.
        ml_schema_obj = self.model.schemas.get(self.ml_schema)
        execution_table = ml_schema_obj.tables.get("Execution") if ml_schema_obj is not None else None

        seen_feature_tables: set[tuple[str, str]] = set()
        features: list[Feature] = []
        for schema_name in [*self.domain_schemas, self.ml_schema]:
            schema = self.model.schemas.get(schema_name)
            if schema is None:
                continue
            for t in schema.tables.values():
                if execution_table is not None and t is execution_table:
                    continue
                if self.is_vocabulary(t):
                    continue
                for f in find_table_features(t):
                    key = (f.feature_table.schema.name, f.feature_table.name)
                    if key in seen_feature_tables:
                        continue
                    seen_feature_tables.add(key)
                    features.append(f)
        return features

    def lookup_feature(self, table: TableInput, feature_name: str) -> Feature:
        """Look up the named feature on ``table``.

        Features are association tables (linking a target table to
        vocabulary terms, assets, and metadata) discovered by
        :meth:`find_features`. This is the by-name accessor.

        Args:
            table: The target table the feature is attached to. Name or
                :class:`Table`.
            feature_name: The feature's name as set in its
                ``Feature_Name`` column.

        Returns:
            The :class:`Feature` wrapper for the matching association.

        Raises:
            DerivaMLTableNotFound: If ``table`` doesn't exist.
            DerivaMLFeatureNotFound: If no feature with
                ``feature_name`` is defined on ``table``.

        Example:
            >>> feature = model.lookup_feature("Image", "Quality")  # doctest: +SKIP
            >>> feature.feature_name  # doctest: +SKIP
            'Quality'
        """
        table = self.name_to_table(table)
        try:
            return [f for f in self.find_features(table) if f.feature_name == feature_name][0]
        except IndexError:
            raise DerivaMLFeatureNotFound(table.name, feature_name) from None

    def asset_metadata(self, table: TableInput) -> set[str]:
        """Return the non-asset columns of an asset table.

        Asset tables are ``Table.is_asset()`` tables: they carry the
        standard ``URL`` / ``Filename`` / ``Length`` / ``MD5`` columns
        plus arbitrary domain-specific metadata. This method returns
        the metadata column names — i.e. everything *except* the four
        standard asset columns (kept in
        :data:`~deriva_ml.core.definitions.DerivaAssetColumns`).

        Args:
            table: The asset table — name or :class:`Table` instance.

        Returns:
            Set of metadata column names. Empty if the asset table
            carries no extra columns.

        Raises:
            DerivaMLTableTypeError: If ``table`` is not an asset table.
            DerivaMLTableNotFound: If ``table`` doesn't exist (raised by
                :meth:`name_to_table`).

        Example:
            >>> sorted(model.asset_metadata("Image"))  # doctest: +SKIP
            ['Description', 'Image_Class']
        """
        table = self.name_to_table(table)

        if not self.is_asset(table):
            raise DerivaMLTableTypeError("asset table", table.name)
        return {c.name for c in table.columns} - DerivaAssetColumns

    def asset_metadata_columns(self, table: TableInput) -> list[Column]:
        """Return Column objects for the asset-metadata columns of ``table``.

        Like :meth:`asset_metadata` but returns the :class:`Column`
        instances (not just names) so callers can inspect attributes
        such as ``nullok``. Results are sorted by column name for
        deterministic iteration.

        Args:
            table: Asset table name or Table object.

        Returns:
            Sorted list of Column objects.

        Raises:
            DerivaMLTableTypeError: If ``table`` is not an asset table.

        Example:
            >>> [c.name for c in model.asset_metadata_columns("Image")]  # doctest: +SKIP
            ['Description', 'Image_Class']
        """
        table = self.name_to_table(table)
        if not self.is_asset(table):
            raise DerivaMLTableTypeError("asset table", table.name)
        return sorted(
            (c for c in table.columns if c.name not in DerivaAssetColumns),
            key=lambda c: c.name,
        )

    def asset_metadata_sorted(self, table: TableInput) -> list[str]:
        """Return the asset-metadata column **names** in deterministic order.

        Sorted by name. Pins the alphabetic-order invariant in one
        place so call sites stay in lockstep:

        - :func:`~deriva_ml.core.upload_layout.asset_table_upload_spec`
          builds the upload regex from these names; the directory
          order in the staging tree must match the regex order.
        - :func:`~deriva_ml.execution.bag_commit._add_asset_rows_to_bag`
          emits metadata columns into the bag in the same order so
          the recorded rows align with the upload regex captures.

        Pre-extraction, each call site re-wrote
        ``sorted(model.asset_metadata(table))`` inline. Centralising
        the call shape means a future change to the ordering rule
        (e.g. case-insensitive sort, or sorted by FK target) lands
        once and everyone follows.

        Args:
            table: Asset table name or :class:`Table` instance.

        Returns:
            Sorted list of metadata column names. Empty list if
            the table carries no extra columns.

        Raises:
            DerivaMLTableTypeError: If ``table`` isn't an asset table.

        Example:
            >>> from deriva_ml.model.catalog import DerivaModel  # doctest: +SKIP
            >>> model.asset_metadata_sorted("Image")  # doctest: +SKIP
            ['Asset_Role', 'Description']
        """
        return sorted(self.asset_metadata(table))

    def apply(self) -> None:
        """Apply pending annotation/schema changes via the underlying Model.

        Thin passthrough to ``self.model.apply()``. Kept explicit so the
        annotation/schema commit boundary is visible on the DerivaModel
        public surface rather than hiding behind generic ``__getattr__``
        delegation.

        Refuses to run when ``self.catalog`` is a
        :class:`~deriva_ml.core.catalog_stub.CatalogStub` (offline mode):
        applying a schema change without a live catalog connection is
        nonsensical, and the underlying ``Model.apply()`` would otherwise
        raise an unhelpful :class:`DerivaMLReadOnlyError` once it reached
        through the stub.

        Raises:
            DerivaMLReadOnlyError: If this DerivaML instance is in offline
                mode (``self.catalog`` is a ``CatalogStub``).

        Example:
            >>> table.annotations[Display.tag] = display.to_dict()  # doctest: +SKIP
            >>> model.apply()  # commit the staged annotation  # doctest: +SKIP
        """
        if isinstance(self.catalog, CatalogStub):
            raise DerivaMLReadOnlyError(
                "DerivaModel.apply() requires online mode; this DerivaML instance was constructed with mode=offline."
            )
        self.model.apply()

    def is_dataset_rid(self, rid: RID, deleted: bool = False) -> bool:
        """Check whether ``rid`` identifies a (non-deleted) Dataset row.

        Resolves ``rid`` against the live catalog via
        :meth:`ErmrestCatalog.resolve_rid` to determine which table it
        belongs to, then verifies it's the ``Dataset`` table. By default
        deleted datasets are treated as not-a-dataset; pass ``deleted=True``
        to include tombstoned rows in the positive set.

        Args:
            rid: The RID to test.
            deleted: If True, return ``True`` for soft-deleted datasets
                too. Defaults to False (deleted rows return ``False``).

        Returns:
            True if ``rid`` is a Dataset row (filtered by the ``deleted``
            flag), False if it points at a different table.

        Raises:
            DerivaMLException: If ``rid`` doesn't resolve in the catalog
                at all (typically an invalid or fabricated RID).

        Example:
            >>> model.is_dataset_rid("1-abc123")  # doctest: +SKIP
            True
            >>> model.is_dataset_rid("1-image01")  # an Image RID  # doctest: +SKIP
            False
        """
        try:
            rid_info = self.model.catalog.resolve_rid(rid, self.model)
        except KeyError as _e:
            raise DerivaMLException(f"Invalid RID {rid}")
        if rid_info.table.name != "Dataset":
            return False
        elif deleted:
            # Got a dataset rid. Now check to see if its deleted or not.
            return True
        else:
            return not list(rid_info.datapath.entities().fetch())[0]["Deleted"]

    def list_dataset_element_types(self) -> list[Table]:
        """List the deriva-py ``Table`` types that can be dataset members.

        Walks ``Dataset.find_associations()`` and returns the
        ``other_fkey.pk_table`` for each association whose target is a
        domain-schema table or the Dataset table itself. Used by
        ``DerivaML.add_dataset_members`` to validate the kind of row
        a caller is trying to add to a dataset.

        Returns:
            A list of :class:`~deriva.core.ermrest_model.Table`
            objects — one per valid member type.

        Example:
            >>> [t.name for t in model.list_dataset_element_types()]  # doctest: +SKIP
            ['Image', 'Subject', 'Dataset']
        """

        dataset_table = self.name_to_table("Dataset")

        def is_domain_or_dataset_table(table: Table) -> bool:
            return self.is_domain_schema(table.schema.name) or table.name == dataset_table.name

        return [
            t
            for a in dataset_table.find_associations()
            if is_domain_or_dataset_table(t := a.other_fkeys.pop().pk_table)
        ]

    # ------------------------------------------------------------------
    # Denormalization planner
    #
    # The planner — schema-graph reachability + JOIN tree construction —
    # was extracted into :mod:`deriva_ml.model.denormalize_planner` in
    # Phase 3 (audit §5.2). It's a ~1100 LoC algorithm subsystem with a
    # narrow consumer set (``local_db/`` + a couple of single-line
    # sites). The split keeps :class:`DerivaModel` focused on its
    # wide-fan-out role (introspection touched by every mixin) and
    # gives the planner its own focused module.
    #
    # Access the planner via :attr:`_planner`. All planner methods are
    # underscore-prefixed because the planner is internal to the
    # denormalization subsystem; the user-facing API is
    # :class:`local_db.denormalize.Denormalizer`.
    # ------------------------------------------------------------------

    @property
    def _planner(self) -> "DenormalizePlanner":
        """Lazily-constructed :class:`DenormalizePlanner` for this model.

        Cached on the instance after first access so reachability /
        join-tree computations don't repeat the construction cost. The
        planner reads schemas/tables through ``self`` and never mutates
        the model, so the cache is safe to share. The planner itself
        isn't documented as thread-safe — callers needing concurrent
        access should construct their own ``DenormalizePlanner``
        per-thread.

        Uses a single-underscore attribute name (``_planner_cache``)
        rather than double-underscore to avoid Python's name
        mangling and keep ``hasattr`` lookups straightforward.
        """
        if not hasattr(self, "_planner_cache"):
            self._planner_cache = DenormalizePlanner(self)
        return self._planner_cache

    def create_table(self, table_def: TableDefinition, schema: str | None = None) -> Table:
        """Create a new table from TableDefinition.

        Args:
            table_def: Table definition (dataclass or dict).
            schema: Schema to create the table in. If None, uses default_schema.

        Returns:
            The newly created Table.

        Raises:
            DerivaMLException: If no schema specified and default_schema is not set.

        Note: @validate_call removed because TableDefinition is now a dataclass from
        deriva.core.typed and Pydantic validation doesn't work well with dataclass fields.

        Example:
            >>> from deriva_ml.core.definitions import TableDefinition, ColumnDefinition  # doctest: +SKIP
            >>> table_def = TableDefinition(  # doctest: +SKIP
            ...     name="Observation",
            ...     column_defs=[ColumnDefinition(name="Note", type="text")],
            ... )
            >>> new_table = model.create_table(table_def, schema="my_domain")  # doctest: +SKIP
        """
        schema = schema or self._require_default_schema()
        # Handle both TableDefinition (dataclass with to_dict) and plain dicts
        table_dict = table_def.to_dict() if hasattr(table_def, "to_dict") else table_def
        return self.model.schemas[schema].create_table(table_dict)

    def _define_association(
        self,
        associates: list,
        metadata: list | None = None,
        table_name: str | None = None,
        comment: str | None = None,
        **kwargs,
    ) -> dict:
        """Build an association table definition with vocab-aware key selection.

        Wraps Table.define_association to ensure non-vocabulary tables use RID
        as their foreign key target. The default key search heuristic in
        define_association prefers Name/ID keys over RID, which is correct for
        vocabulary tables (FK to human-readable Name) but wrong for domain
        tables that happen to have non-nullable Name or ID keys (e.g., tables
        in cloned catalogs like FaceBase).

        Args:
            associates: Reference targets being associated (Table, Key, or tuples).
            metadata: Additional metadata fields and/or reference targets.
            table_name: Name for the association table.
            comment: Comment for the association table.
            **kwargs: Additional arguments passed to Table.define_association.

        Returns:
            Table definition dict suitable for create_table.
        """
        metadata = metadata or []

        def _resolve_key(ref):
            """Convert non-vocabulary Table references to their RID Key."""
            if isinstance(ref, tuple):
                # (name, Table) or (name, nullok, Table) — resolve the Table element
                items = list(ref)
                table_obj = items[-1]
                if isinstance(table_obj, Table) and not table_obj.is_vocabulary():
                    items[-1] = table_obj.key_by_columns(["RID"])
                return tuple(items)
            elif isinstance(ref, Table) and not ref.is_vocabulary():
                return ref.key_by_columns(["RID"])
            return ref  # Key objects or vocabulary Tables pass through

        resolved_associates = [_resolve_key(a) for a in associates]
        resolved_metadata = [_resolve_key(m) for m in metadata]

        return Table.define_association(
            associates=resolved_associates,
            metadata=resolved_metadata,
            table_name=table_name,
            comment=comment,
            **kwargs,
        )

chaise_config `property`

chaise_config: dict[str, Any]

Return the chaise configuration.

Returns:

Type	Description
`dict[str, Any]`	The catalog-level Chaise display configuration annotation as a dict.

Example

cfg = model.chaise_config # doctest: +SKIP "navbarBrandText" in cfg # doctest: +SKIP True

getattr

__getattr__(name: str) -> Any

Delegate unknown attribute access to the underlying deriva-py Model.

Called only when name is not already an attribute of the DerivaModel instance (per Python's attribute resolution order), so explicit properties on this class — chaise_config, apply, catalog, schemas (inherited via :class:DatabaseModel from :class:deriva.bag.database.BagDatabase) — take precedence.

Kept as a fallback because self.model.<attr> is reached at 50+ call sites for schemas, annotations and a long tail of deriva-py Model attributes. Replacing each with explicit accessors would collide with mixins (e.g. BagDatabase.schemas is an instance-attribute set in its __init__, which a @property would shadow and block assignment to).

Source code in src/deriva_ml/model/catalog.py

def __getattr__(self, name: str) -> Any:
    """Delegate unknown attribute access to the underlying deriva-py Model.

    Called only when ``name`` is not already an attribute of the
    ``DerivaModel`` instance (per Python's attribute resolution order),
    so explicit properties on this class — ``chaise_config``,
    ``apply``, ``catalog``, ``schemas`` (inherited via :class:`DatabaseModel`
    from :class:`deriva.bag.database.BagDatabase`) — take precedence.

    Kept as a fallback because ``self.model.<attr>`` is reached at 50+
    call sites for ``schemas``, ``annotations`` and a long tail of
    deriva-py Model attributes. Replacing each with explicit
    accessors would collide with mixins (e.g. ``BagDatabase.schemas``
    is an instance-attribute set in its ``__init__``, which a
    ``@property`` would shadow and block assignment to).
    """
    return getattr(self.model, name)

init

__init__(
    model: Model,
    ml_schema: str = ML_SCHEMA,
    domain_schemas: str
    | set[str]
    | None = None,
    default_schema: str | None = None,
)

Create and initialize a DerivaModel instance.

This method will connect to a catalog and initialize schema configuration. This class is intended to be used as a base class on which domain-specific interfaces are built.

Parameters:

Name	Type	Description	Default
`model`	`Model`	The ERMRest model for the catalog.	required
`ml_schema`	`str`	The ML schema name.	`ML_SCHEMA`
`domain_schemas`	`str \| set[str] \| None`	Optional explicit set of domain schema names. If None, auto-detects all non-system schemas.	`None`
`default_schema`	`str \| None`	The default schema for table creation operations. If None and there is exactly one domain schema, that schema is used as default. If there are multiple domain schemas, default_schema must be specified.	`None`

Source code in src/deriva_ml/model/catalog.py

def __init__(
    self,
    model: Model,
    ml_schema: str = ML_SCHEMA,
    domain_schemas: str | set[str] | None = None,
    default_schema: str | None = None,
):
    """Create and initialize a DerivaModel instance.

    This method will connect to a catalog and initialize schema configuration.
    This class is intended to be used as a base class on which domain-specific interfaces are built.

    Args:
        model: The ERMRest model for the catalog.
        ml_schema: The ML schema name.
        domain_schemas: Optional explicit set of domain schema names. If None,
            auto-detects all non-system schemas.
        default_schema: The default schema for table creation operations. If None
            and there is exactly one domain schema, that schema is used as default.
            If there are multiple domain schemas, default_schema must be specified.
    """
    self.model = model
    self.catalog: ErmrestCatalog = self.model.catalog
    self.hostname = self.catalog.deriva_server.server if isinstance(self.catalog, ErmrestCatalog) else "localhost"

    self.ml_schema = ml_schema

    # Determine domain schemas
    if domain_schemas is not None:
        if isinstance(domain_schemas, str):
            domain_schemas = {domain_schemas}
        self.domain_schemas = frozenset(domain_schemas)
    else:
        # Auto-detect all domain schemas
        self.domain_schemas = _get_domain_schemas(self.model.schemas.keys(), ml_schema)

    # Determine default schema for table creation
    if default_schema is not None:
        if default_schema not in self.domain_schemas:
            raise DerivaMLException(
                f"default_schema '{default_schema}' is not in domain_schemas: {self.domain_schemas}"
            )
        self.default_schema = default_schema
    elif len(self.domain_schemas) == 1:
        # Single domain schema - use it as default
        self.default_schema = next(iter(self.domain_schemas))
    elif len(self.domain_schemas) == 0:
        # No domain schemas - default_schema will be None
        self.default_schema = None
    else:
        # Multiple domain schemas, no explicit default
        self.default_schema = None

apply

apply() -> None

Apply pending annotation/schema changes via the underlying Model.

Thin passthrough to self.model.apply(). Kept explicit so the annotation/schema commit boundary is visible on the DerivaModel public surface rather than hiding behind generic __getattr__ delegation.

Refuses to run when self.catalog is a :class:~deriva_ml.core.catalog_stub.CatalogStub (offline mode): applying a schema change without a live catalog connection is nonsensical, and the underlying Model.apply() would otherwise raise an unhelpful :class:DerivaMLReadOnlyError once it reached through the stub.

Raises:

Type	Description
`DerivaMLReadOnlyError`	If this DerivaML instance is in offline mode (`self.catalog` is a `CatalogStub`).

Example

table.annotations[Display.tag] = display.to_dict() # doctest: +SKIP model.apply() # commit the staged annotation # doctest: +SKIP

Source code in src/deriva_ml/model/catalog.py

def apply(self) -> None:
    """Apply pending annotation/schema changes via the underlying Model.

    Thin passthrough to ``self.model.apply()``. Kept explicit so the
    annotation/schema commit boundary is visible on the DerivaModel
    public surface rather than hiding behind generic ``__getattr__``
    delegation.

    Refuses to run when ``self.catalog`` is a
    :class:`~deriva_ml.core.catalog_stub.CatalogStub` (offline mode):
    applying a schema change without a live catalog connection is
    nonsensical, and the underlying ``Model.apply()`` would otherwise
    raise an unhelpful :class:`DerivaMLReadOnlyError` once it reached
    through the stub.

    Raises:
        DerivaMLReadOnlyError: If this DerivaML instance is in offline
            mode (``self.catalog`` is a ``CatalogStub``).

    Example:
        >>> table.annotations[Display.tag] = display.to_dict()  # doctest: +SKIP
        >>> model.apply()  # commit the staged annotation  # doctest: +SKIP
    """
    if isinstance(self.catalog, CatalogStub):
        raise DerivaMLReadOnlyError(
            "DerivaModel.apply() requires online mode; this DerivaML instance was constructed with mode=offline."
        )
    self.model.apply()

asset_metadata

asset_metadata(
    table: TableInput,
) -> set[str]

Return the non-asset columns of an asset table.

Asset tables are Table.is_asset() tables: they carry the standard URL / Filename / Length / MD5 columns plus arbitrary domain-specific metadata. This method returns the metadata column names — i.e. everything except the four standard asset columns (kept in :data:~deriva_ml.core.definitions.DerivaAssetColumns).

Parameters:

Name	Type	Description	Default
`table`	`TableInput`	The asset table — name or :class:`Table` instance.	required

Returns:

Type	Description
`set[str]`	Set of metadata column names. Empty if the asset table
`set[str]`	carries no extra columns.

Raises:

Type	Description
`DerivaMLTableTypeError`	If `table` is not an asset table.
`DerivaMLTableNotFound`	If `table` doesn't exist (raised by :meth:`name_to_table`).

Example

sorted(model.asset_metadata("Image")) # doctest: +SKIP ['Description', 'Image_Class']

Source code in src/deriva_ml/model/catalog.py

def asset_metadata(self, table: TableInput) -> set[str]:
    """Return the non-asset columns of an asset table.

    Asset tables are ``Table.is_asset()`` tables: they carry the
    standard ``URL`` / ``Filename`` / ``Length`` / ``MD5`` columns
    plus arbitrary domain-specific metadata. This method returns
    the metadata column names — i.e. everything *except* the four
    standard asset columns (kept in
    :data:`~deriva_ml.core.definitions.DerivaAssetColumns`).

    Args:
        table: The asset table — name or :class:`Table` instance.

    Returns:
        Set of metadata column names. Empty if the asset table
        carries no extra columns.

    Raises:
        DerivaMLTableTypeError: If ``table`` is not an asset table.
        DerivaMLTableNotFound: If ``table`` doesn't exist (raised by
            :meth:`name_to_table`).

    Example:
        >>> sorted(model.asset_metadata("Image"))  # doctest: +SKIP
        ['Description', 'Image_Class']
    """
    table = self.name_to_table(table)

    if not self.is_asset(table):
        raise DerivaMLTableTypeError("asset table", table.name)
    return {c.name for c in table.columns} - DerivaAssetColumns

asset_metadata_columns

asset_metadata_columns(
    table: TableInput,
) -> list[Column]

Return Column objects for the asset-metadata columns of table.

Like :meth:asset_metadata but returns the :class:Column instances (not just names) so callers can inspect attributes such as nullok. Results are sorted by column name for deterministic iteration.

Parameters:

Name	Type	Description	Default
`table`	`TableInput`	Asset table name or Table object.	required

Returns:

Type	Description
`list[Column]`	Sorted list of Column objects.

Raises:

Type	Description
`DerivaMLTableTypeError`	If `table` is not an asset table.

Example

[c.name for c in model.asset_metadata_columns("Image")] # doctest: +SKIP ['Description', 'Image_Class']

Source code in src/deriva_ml/model/catalog.py

def asset_metadata_columns(self, table: TableInput) -> list[Column]:
    """Return Column objects for the asset-metadata columns of ``table``.

    Like :meth:`asset_metadata` but returns the :class:`Column`
    instances (not just names) so callers can inspect attributes
    such as ``nullok``. Results are sorted by column name for
    deterministic iteration.

    Args:
        table: Asset table name or Table object.

    Returns:
        Sorted list of Column objects.

    Raises:
        DerivaMLTableTypeError: If ``table`` is not an asset table.

    Example:
        >>> [c.name for c in model.asset_metadata_columns("Image")]  # doctest: +SKIP
        ['Description', 'Image_Class']
    """
    table = self.name_to_table(table)
    if not self.is_asset(table):
        raise DerivaMLTableTypeError("asset table", table.name)
    return sorted(
        (c for c in table.columns if c.name not in DerivaAssetColumns),
        key=lambda c: c.name,
    )

asset_metadata_sorted

asset_metadata_sorted(
    table: TableInput,
) -> list[str]

Return the asset-metadata column names in deterministic order.

Sorted by name. Pins the alphabetic-order invariant in one place so call sites stay in lockstep:

:func:~deriva_ml.core.upload_layout.asset_table_upload_spec builds the upload regex from these names; the directory order in the staging tree must match the regex order.
:func:~deriva_ml.execution.bag_commit._add_asset_rows_to_bag emits metadata columns into the bag in the same order so the recorded rows align with the upload regex captures.

Pre-extraction, each call site re-wrote sorted(model.asset_metadata(table)) inline. Centralising the call shape means a future change to the ordering rule (e.g. case-insensitive sort, or sorted by FK target) lands once and everyone follows.

Parameters:

Name	Type	Description	Default
`table`	`TableInput`	Asset table name or :class:`Table` instance.	required

Returns:

Type	Description
`list[str]`	Sorted list of metadata column names. Empty list if
`list[str]`	the table carries no extra columns.

Raises:

Type	Description
`DerivaMLTableTypeError`	If `table` isn't an asset table.

Example

from deriva_ml.model.catalog import DerivaModel # doctest: +SKIP model.asset_metadata_sorted("Image") # doctest: +SKIP ['Asset_Role', 'Description']

Source code in src/deriva_ml/model/catalog.py

def asset_metadata_sorted(self, table: TableInput) -> list[str]:
    """Return the asset-metadata column **names** in deterministic order.

    Sorted by name. Pins the alphabetic-order invariant in one
    place so call sites stay in lockstep:

    - :func:`~deriva_ml.core.upload_layout.asset_table_upload_spec`
      builds the upload regex from these names; the directory
      order in the staging tree must match the regex order.
    - :func:`~deriva_ml.execution.bag_commit._add_asset_rows_to_bag`
      emits metadata columns into the bag in the same order so
      the recorded rows align with the upload regex captures.

    Pre-extraction, each call site re-wrote
    ``sorted(model.asset_metadata(table))`` inline. Centralising
    the call shape means a future change to the ordering rule
    (e.g. case-insensitive sort, or sorted by FK target) lands
    once and everyone follows.

    Args:
        table: Asset table name or :class:`Table` instance.

    Returns:
        Sorted list of metadata column names. Empty list if
        the table carries no extra columns.

    Raises:
        DerivaMLTableTypeError: If ``table`` isn't an asset table.

    Example:
        >>> from deriva_ml.model.catalog import DerivaModel  # doctest: +SKIP
        >>> model.asset_metadata_sorted("Image")  # doctest: +SKIP
        ['Asset_Role', 'Description']
    """
    return sorted(self.asset_metadata(table))

create_table

create_table(
    table_def: TableDefinition,
    schema: str | None = None,
) -> Table

Create a new table from TableDefinition.

Parameters:

Name	Type	Description	Default
`table_def`	`TableDefinition`	Table definition (dataclass or dict).	required
`schema`	`str \| None`	Schema to create the table in. If None, uses default_schema.	`None`

Returns:

Type	Description
`Table`	The newly created Table.

Raises:

Type	Description
`DerivaMLException`	If no schema specified and default_schema is not set.

Note: @validate_call removed because TableDefinition is now a dataclass from deriva.core.typed and Pydantic validation doesn't work well with dataclass fields.

Example

from deriva_ml.core.definitions import TableDefinition, ColumnDefinition # doctest: +SKIP table_def = TableDefinition( # doctest: +SKIP ... name="Observation", ... column_defs=[ColumnDefinition(name="Note", type="text")], ... ) new_table = model.create_table(table_def, schema="my_domain") # doctest: +SKIP

Source code in src/deriva_ml/model/catalog.py

def create_table(self, table_def: TableDefinition, schema: str | None = None) -> Table:
    """Create a new table from TableDefinition.

    Args:
        table_def: Table definition (dataclass or dict).
        schema: Schema to create the table in. If None, uses default_schema.

    Returns:
        The newly created Table.

    Raises:
        DerivaMLException: If no schema specified and default_schema is not set.

    Note: @validate_call removed because TableDefinition is now a dataclass from
    deriva.core.typed and Pydantic validation doesn't work well with dataclass fields.

    Example:
        >>> from deriva_ml.core.definitions import TableDefinition, ColumnDefinition  # doctest: +SKIP
        >>> table_def = TableDefinition(  # doctest: +SKIP
        ...     name="Observation",
        ...     column_defs=[ColumnDefinition(name="Note", type="text")],
        ... )
        >>> new_table = model.create_table(table_def, schema="my_domain")  # doctest: +SKIP
    """
    schema = schema or self._require_default_schema()
    # Handle both TableDefinition (dataclass with to_dict) and plain dicts
    table_dict = table_def.to_dict() if hasattr(table_def, "to_dict") else table_def
    return self.model.schemas[schema].create_table(table_dict)

find_asset_execution_tables

find_asset_execution_tables() -> (
    list[tuple[str, str]]
)

Return the *_Execution association tables across all schemas.

Walks every domain + ML schema once, finds tables whose name ends with _Execution, and caches the result on the instance. Subsequent calls re-use the cache so callers that walk these tables repeatedly (e.g. :func:~deriva_ml.execution._helpers.list_assets) pay the schema-iteration cost exactly once per :class:DerivaModel lifetime.

Two *_Execution tables are excluded because they're not asset-to-execution association tables despite the suffix:

Dataset_Execution — dataset linkage; consumed by :func:~deriva_ml.execution._helpers.list_input_datasets.
Execution_Execution — nested-execution hierarchy (parent/child); has no Asset_Role column. Hitting it during the list_assets(asset_role=...) walk produces an AttributeError ("no such column Asset_Role") rather than just returning zero matches, so the exclusion is correctness-critical.

The cache is invalidated whenever the underlying model object identity changes (e.g. after a catalog Model.fromcatalog refetch). In practice the model is only refreshed in long-lived sessions that mutate schema — the common case (read-mostly scripts) hits the cache on every call after the first.

Returns:

Type	Description
`list[tuple[str, str]]`	List of `(schema_name, table_name)` pairs, ordered
`list[tuple[str, str]]`	by schema-then-table for deterministic iteration.

Example

from deriva_ml.model.catalog import DerivaModel # doctest: +SKIP model.find_asset_execution_tables() # doctest: +SKIP [('deriva-ml', 'Execution_Asset_Execution'), ('test_schema', 'Image_Execution')]

Source code in src/deriva_ml/model/catalog.py

def find_asset_execution_tables(self) -> list[tuple[str, str]]:
    """Return the ``*_Execution`` association tables across all schemas.

    Walks every domain + ML schema once, finds tables whose
    name ends with ``_Execution``, and caches the result on
    the instance. Subsequent calls re-use the cache so callers
    that walk these tables repeatedly (e.g.
    :func:`~deriva_ml.execution._helpers.list_assets`) pay
    the schema-iteration cost exactly once per
    :class:`DerivaModel` lifetime.

    Two ``*_Execution`` tables are **excluded** because they're
    not asset-to-execution association tables despite the
    suffix:

    - ``Dataset_Execution`` — dataset linkage; consumed by
      :func:`~deriva_ml.execution._helpers.list_input_datasets`.
    - ``Execution_Execution`` — nested-execution hierarchy
      (parent/child); has no ``Asset_Role`` column. Hitting
      it during the ``list_assets(asset_role=...)`` walk
      produces an ``AttributeError`` ("no such column
      ``Asset_Role``") rather than just returning zero
      matches, so the exclusion is correctness-critical.

    The cache is invalidated whenever the underlying model
    object identity changes (e.g. after a catalog
    ``Model.fromcatalog`` refetch). In practice the model is
    only refreshed in long-lived sessions that mutate schema
    — the common case (read-mostly scripts) hits the cache
    on every call after the first.

    Returns:
        List of ``(schema_name, table_name)`` pairs, ordered
        by schema-then-table for deterministic iteration.

    Example:
        >>> from deriva_ml.model.catalog import DerivaModel  # doctest: +SKIP
        >>> model.find_asset_execution_tables()  # doctest: +SKIP
        [('deriva-ml', 'Execution_Asset_Execution'),
         ('test_schema', 'Image_Execution')]
    """
    # Cache key is the underlying model object's identity —
    # a refetch swaps it out and we recompute. Cheap, safe.
    cached = getattr(self, "_asset_execution_tables_cache", None)
    if cached is not None and cached[0] is self.model:
        return cached[1]

    result: list[tuple[str, str]] = []
    schemas_to_search = [*sorted(self.domain_schemas), self.ml_schema]
    for schema_name in schemas_to_search:
        schema_obj = self.model.schemas.get(schema_name)
        if schema_obj is None:
            continue
        for table in schema_obj.tables.values():
            if not table.name.endswith("_Execution"):
                continue
            # ``Dataset_Execution`` is the dataset linkage; not an
            # asset association. ``Execution_Execution`` is the
            # nested-execution parent/child table; has no
            # ``Asset_Role`` column so it'd crash any
            # ``list_assets(asset_role=...)`` walk.
            if table.name in ("Dataset_Execution", "Execution_Execution"):
                continue
            result.append((schema_name, table.name))

    self._asset_execution_tables_cache = (self.model, result)
    return result

find_assets

find_assets() -> list[Table]

Return the list of asset tables in the current model.

Returns:

Type	Description
`list[Table]`	All tables across every schema that satisfy :meth:`is_asset`.

Example

[t.name for t in model.find_assets()] # doctest: +SKIP ['Image', 'Execution_Asset']

Source code in src/deriva_ml/model/catalog.py

def find_assets(self) -> list[Table]:
    """Return the list of asset tables in the current model.

    Returns:
        All tables across every schema that satisfy :meth:`is_asset`.

    Example:
        >>> [t.name for t in model.find_assets()]  # doctest: +SKIP
        ['Image', 'Execution_Asset']
    """
    return [t for s in self.model.schemas.values() for t in s.tables.values() if self.is_asset(t)]

find_association

find_association(
    table1: TableInput,
    table2: TableInput,
) -> tuple[Table, str, str]

Return the unique association table linking table1 and table2.

Searches all associations on table1 for one whose other-side FK lands on table2. The result lets callers JOIN through the link without re-deriving the column names by hand.

Parameters:

Name	Type	Description	Default
`table1`	`TableInput`	Either endpoint of the association. Table name or :class:`Table`.	required
`table2`	`TableInput`	The other endpoint. Table name or :class:`Table`.	required

Returns:

Type	Description
`Table`	`(assoc_table, table1_link_column, table2_link_column)`
`str`	— the association :class:`Table` itself plus the names (as
`str`	`str`) of the two FK columns on it (one referencing
`tuple[Table, str, str]`	`table1`, one referencing `table2`). The column names are
`tuple[Table, str, str]`	returned as strings because every caller uses them directly
`tuple[Table, str, str]`	as `datapath` `.columns[...]` keys or insert-row dict keys.

Raises:

Type	Description
`NoAssociationException`	If no association table connects the two tables. Callers that legitimately handle the "no link" case (e.g. probing whether an asset table is tracked through `Execution`) should catch this specific subclass rather than the broader :class:`DerivaMLException`.
`AmbiguousAssociationException`	If multiple association tables connect the two tables. The caller must disambiguate by naming the desired association table directly.

Example

assoc, c1, c2 = model.find_association("Dataset", "Image") # doctest: +SKIP assoc.name, c1, c2 # doctest: +SKIP ('Dataset_Image', 'Dataset', 'Image')

Source code in src/deriva_ml/model/catalog.py

def find_association(self, table1: TableInput, table2: TableInput) -> tuple[Table, str, str]:
    """Return the unique association table linking ``table1`` and ``table2``.

    Searches all associations on ``table1`` for one whose other-side
    FK lands on ``table2``. The result lets callers JOIN through the
    link without re-deriving the column names by hand.

    Args:
        table1: Either endpoint of the association. Table name or
            :class:`Table`.
        table2: The other endpoint. Table name or :class:`Table`.

    Returns:
        ``(assoc_table, table1_link_column, table2_link_column)``
        — the association :class:`Table` itself plus the *names* (as
        ``str``) of the two FK columns on it (one referencing
        ``table1``, one referencing ``table2``). The column names are
        returned as strings because every caller uses them directly
        as ``datapath`` ``.columns[...]`` keys or insert-row dict keys.

    Raises:
        NoAssociationException: If no association table connects the
            two tables. Callers that legitimately handle the "no link"
            case (e.g. probing whether an asset table is tracked
            through ``Execution``) should catch this specific subclass
            rather than the broader :class:`DerivaMLException`.
        AmbiguousAssociationException: If multiple association tables
            connect the two tables. The caller must disambiguate by
            naming the desired association table directly.

    Example:
        >>> assoc, c1, c2 = model.find_association("Dataset", "Image")  # doctest: +SKIP
        >>> assoc.name, c1, c2  # doctest: +SKIP
        ('Dataset_Image', 'Dataset', 'Image')
    """
    table1 = self.name_to_table(table1)
    table2 = self.name_to_table(table2)

    tables = [
        (a.table, a.self_fkey.columns[0].name, other_key.columns[0].name)
        for a in table1.find_associations(pure=False)
        if len(a.other_fkeys) == 1 and (other_key := a.other_fkeys.pop()).pk_table == table2
    ]

    if len(tables) == 1:
        return tables[0]
    elif len(tables) == 0:
        raise NoAssociationException(table1.name, table2.name)
    else:
        raise AmbiguousAssociationException(table1.name, table2.name, len(tables))

find_features

find_features(
    table: TableInput | None = None,
) -> Iterable[Feature]

List features in the catalog.

If a table is specified, returns only features for that table. If no table is specified, returns all features across all tables in the catalog.

Parameters:

Name	Type	Description	Default
`table`	`TableInput \| None`	Optional table to find features for. If None, returns all features in the catalog.	`None`

Returns:

Type	Description
`Iterable[Feature]`	An iterable of Feature instances describing the features.

Example

[f.feature_name for f in model.find_features("Image")] # doctest: +SKIP ['BoundingBox', 'Quality'] all_features = list(model.find_features()) # doctest: +SKIP

Source code in src/deriva_ml/model/catalog.py

@validate_call(config=VALIDATION_CONFIG)
def find_features(self, table: TableInput | None = None) -> Iterable[Feature]:
    """List features in the catalog.

    If a table is specified, returns only features for that table.
    If no table is specified, returns all features across all tables in the catalog.

    Args:
        table: Optional table to find features for. If None, returns all features
            in the catalog.

    Returns:
        An iterable of Feature instances describing the features.

    Example:
        >>> [f.feature_name for f in model.find_features("Image")]  # doctest: +SKIP
        ['BoundingBox', 'Quality']
        >>> all_features = list(model.find_features())  # doctest: +SKIP
    """

    def is_feature(a: FindAssociationResult) -> bool:
        """Check if association represents a feature.

        Args:
            a: Association result to check
        Returns:
            bool: True if association represents a feature
        """
        return {
            "Feature_Name",
            "Execution",
            a.self_fkey.foreign_key_columns[0].name,
        }.issubset({c.name for c in a.table.columns})

    def find_table_features(t: Table) -> list[Feature]:
        """Find all features for a single table.

        ``max_arity`` is left unbounded (``None``) so that
        *key-qualified* multi-value features are discovered. A
        qualifier is a value FK that participates in the
        association table's compound uniqueness key — e.g.
        ``Image_Side`` on eye-ai's ``Execution_Subject_Chart_Label``,
        where the same Subject legitimately has a left-eye and a
        right-eye row. Such a key includes
        ``{Execution, Subject, Feature_Name, Image_Side}``, giving a
        key-FK arity of 4. The former ``max_arity=3`` cap silently
        excluded these features from discovery (and therefore from
        ``lookup_feature`` / ``feature_values``).

        ``is_feature`` remains the sole filter: it still requires the
        ``Feature_Name`` and ``Execution`` FKs plus the target FK, so
        removing the ceiling cannot admit a non-feature association
        (a plain N-way domain join lacks ``Feature_Name``).
        ``min_arity=3`` is retained as the lower bound.
        """
        return [
            Feature(a, self) for a in t.find_associations(min_arity=3, max_arity=None, pure=False) if is_feature(a)
        ]

    if table is not None:
        # Find features for a specific table
        return find_table_features(self.name_to_table(table))

    # No table arg: discover features across the whole catalog.
    #
    # ``find_associations`` walks ``Table.referenced_by`` from each
    # candidate table, so the same association table is visited
    # once per FK target. For a single ``Image.Image_Classification``
    # feature backed by ``Execution_Image_Image_Classification``
    # (an association with FKs to Image, Execution, and the
    # Image_Class vocab) the naive cross-schema scan yields three
    # Feature objects -- one with ``target_table=Image`` (the
    # actual target), one with ``target_table=Execution``, and one
    # with ``target_table=Image_Class``. Only the first is what
    # callers want. See
    # docs/bugs/2026-05-19-find-features-duplicates.md.
    #
    # The fix is twofold:
    # 1. Skip iteration over tables that can never be the actual
    #    feature target -- the Execution table and any vocabulary
    #    table. Every feature association references both, so
    #    scanning them only produces duplicates.
    # 2. Dedup the remaining list by the association table itself
    #    (qualified schema.name), in case multiple distinct target
    #    tables share an association in some non-canonical layout.
    ml_schema_obj = self.model.schemas.get(self.ml_schema)
    execution_table = ml_schema_obj.tables.get("Execution") if ml_schema_obj is not None else None

    seen_feature_tables: set[tuple[str, str]] = set()
    features: list[Feature] = []
    for schema_name in [*self.domain_schemas, self.ml_schema]:
        schema = self.model.schemas.get(schema_name)
        if schema is None:
            continue
        for t in schema.tables.values():
            if execution_table is not None and t is execution_table:
                continue
            if self.is_vocabulary(t):
                continue
            for f in find_table_features(t):
                key = (f.feature_table.schema.name, f.feature_table.name)
                if key in seen_feature_tables:
                    continue
                seen_feature_tables.add(key)
                features.append(f)
    return features

find_vocabularies

find_vocabularies() -> list[Table]

Return a list of all controlled vocabulary tables in domain and ML schemas.

Returns:

Type	Description
`list[Table]`	All tables in the domain and ML schemas that satisfy
`list[Table]`	meth:`is_vocabulary`.

Example

[t.name for t in model.find_vocabularies()] # doctest: +SKIP ['Image_Class', 'Workflow_Type']

Source code in src/deriva_ml/model/catalog.py

def find_vocabularies(self) -> list[Table]:
    """Return a list of all controlled vocabulary tables in domain and ML schemas.

    Returns:
        All tables in the domain and ML schemas that satisfy
        :meth:`is_vocabulary`.

    Example:
        >>> [t.name for t in model.find_vocabularies()]  # doctest: +SKIP
        ['Image_Class', 'Workflow_Type']
    """
    tables = []
    for schema_name in [*self.domain_schemas, self.ml_schema]:
        schema = self.model.schemas.get(schema_name)
        if schema:
            tables.extend(t for t in schema.tables.values() if self.is_vocabulary(t))
    return tables

from_cached `classmethod`

from_cached(
    schema_dict: dict,
    *,
    catalog,
    ml_schema: str = ML_SCHEMA,
    domain_schemas: "str | set[str] | None" = None,
    default_schema: "str | None" = None,
) -> "DerivaModel"

Construct a DerivaModel from a cached ermrest /schema dict.

No network is touched. The catalog argument is passed to deriva-py's Model(catalog, model_doc) constructor as the first positional argument; in offline mode it will be a :class:~deriva_ml.core.catalog_stub.CatalogStub, in online mode it is a real ErmrestCatalog. DerivaModel.__init__ then reads the catalog back off model.catalog as usual.

This replicates what Model.fromcatalog(catalog) does online — the online call fetches the schema dict via catalog.getCatalogSchema() (cached and ETag-revalidated by deriva-py) and passes the result to Model(catalog, dict). Here we pass in the already-cached dict from :class:~deriva_ml.core.schema_cache.SchemaCache.

Parameters:

Name	Type	Description	Default
`schema_dict`	`dict`	The JSON payload from a previous `catalog.getCatalogSchema()` call (or any equivalent `/schema` GET), as persisted by `SchemaCache`.	required
`catalog`		The catalog object to associate with the model. Pass a real `ErmrestCatalog` online, or a `CatalogStub` offline.	required
`ml_schema`	`str`	ML schema name (default `"deriva-ml"`).	`ML_SCHEMA`
`domain_schemas`	`'str \| set[str] \| None'`	Optional explicit set of domain schema names. If None, auto-detects all non-system schemas from the cached dict.	`None`
`default_schema`	`'str \| None'`	Optional default schema name.	`None`

Returns:

Type	Description
`'DerivaModel'`	A `DerivaModel` wrapping a deriva-py `Model`
`'DerivaModel'`	reconstructed from the dict.

Example

cached = schema_cache.load(hostname, catalog_id) # doctest: +SKIP model = DerivaModel.from_cached( # doctest: +SKIP ... cached, catalog=catalog_stub, ml_schema="deriva-ml" ... )

Source code in src/deriva_ml/model/catalog.py

@classmethod
def from_cached(
    cls,
    schema_dict: dict,
    *,
    catalog,
    ml_schema: str = ML_SCHEMA,
    domain_schemas: "str | set[str] | None" = None,
    default_schema: "str | None" = None,
) -> "DerivaModel":
    """Construct a DerivaModel from a cached ermrest /schema dict.

    No network is touched. The ``catalog`` argument is passed to
    deriva-py's ``Model(catalog, model_doc)`` constructor as the
    first positional argument; in offline mode it will be a
    :class:`~deriva_ml.core.catalog_stub.CatalogStub`, in online
    mode it is a real ``ErmrestCatalog``. ``DerivaModel.__init__``
    then reads the catalog back off ``model.catalog`` as usual.

    This replicates what ``Model.fromcatalog(catalog)`` does
    online — the online call fetches the schema dict via
    ``catalog.getCatalogSchema()`` (cached and ETag-revalidated
    by deriva-py) and passes the result to ``Model(catalog, dict)``.
    Here we pass in the already-cached dict from
    :class:`~deriva_ml.core.schema_cache.SchemaCache`.

    Args:
        schema_dict: The JSON payload from a previous
            ``catalog.getCatalogSchema()`` call (or any equivalent
            ``/schema`` GET), as persisted by ``SchemaCache``.
        catalog: The catalog object to associate with the model.
            Pass a real ``ErmrestCatalog`` online, or a
            ``CatalogStub`` offline.
        ml_schema: ML schema name (default ``"deriva-ml"``).
        domain_schemas: Optional explicit set of domain schema
            names. If None, auto-detects all non-system schemas
            from the cached dict.
        default_schema: Optional default schema name.

    Returns:
        A ``DerivaModel`` wrapping a deriva-py ``Model``
        reconstructed from the dict.

    Example:
        >>> cached = schema_cache.load(hostname, catalog_id)  # doctest: +SKIP
        >>> model = DerivaModel.from_cached(  # doctest: +SKIP
        ...     cached, catalog=catalog_stub, ml_schema="deriva-ml"
        ... )
    """
    # Model.__init__(catalog, model_doc) stores catalog as
    # self._catalog and exposes it via the .catalog property;
    # DerivaModel.__init__ then reads self.model.catalog.
    model = Model(catalog, schema_dict)
    return cls(
        model,
        ml_schema=ml_schema,
        domain_schemas=domain_schemas,
        default_schema=default_schema,
    )

get_schema_description

get_schema_description(
    include_system_columns: bool = False,
) -> dict[str, Any]

Return a JSON description of the catalog schema structure.

Provides a structured representation of the domain and ML schemas including tables, columns, foreign keys, and relationships. Useful for understanding the data model structure programmatically.

Parameters:

Name	Type	Description	Default
`include_system_columns`	`bool`	If True, include RID, RCT, RMT, RCB, RMB columns. Default False to reduce output size.	`False`

Returns:

Type	Description
`dict[str, Any]`	Dictionary with schema structure:
`dict[str, Any]`	{ "domain_schemas": ["schema_name1", "schema_name2"], "default_schema": "schema_name1", "ml_schema": "deriva-ml", "schemas": { "schema_name": { "tables": { "TableName": { "comment": "description", "is_vocabulary": bool, "is_asset": bool, "is_association": bool, "columns": [...], "foreign_keys": [...], "features": [...] } } } }
`dict[str, Any]`	}

Example

desc = model.get_schema_description() # doctest: +SKIP sorted(desc["schemas"]) # doctest: +SKIP ['deriva-ml', 'my_domain'] desc["schemas"]["my_domain"]["tables"]["Image"]["is_asset"] # doctest: +SKIP True

Source code in src/deriva_ml/model/catalog.py

def get_schema_description(self, include_system_columns: bool = False) -> dict[str, Any]:
    """Return a JSON description of the catalog schema structure.

    Provides a structured representation of the domain and ML schemas including
    tables, columns, foreign keys, and relationships. Useful for understanding
    the data model structure programmatically.

    Args:
        include_system_columns: If True, include RID, RCT, RMT, RCB, RMB columns.
            Default False to reduce output size.

    Returns:
        Dictionary with schema structure:
        {
            "domain_schemas": ["schema_name1", "schema_name2"],
            "default_schema": "schema_name1",
            "ml_schema": "deriva-ml",
            "schemas": {
                "schema_name": {
                    "tables": {
                        "TableName": {
                            "comment": "description",
                            "is_vocabulary": bool,
                            "is_asset": bool,
                            "is_association": bool,
                            "columns": [...],
                            "foreign_keys": [...],
                            "features": [...]
                        }
                    }
                }
            }
        }

    Example:
        >>> desc = model.get_schema_description()  # doctest: +SKIP
        >>> sorted(desc["schemas"])  # doctest: +SKIP
        ['deriva-ml', 'my_domain']
        >>> desc["schemas"]["my_domain"]["tables"]["Image"]["is_asset"]  # doctest: +SKIP
        True
    """
    system_columns = {"RID", "RCT", "RMT", "RCB", "RMB"}
    result = {
        "domain_schemas": sorted(self.domain_schemas),
        "default_schema": self.default_schema,
        "ml_schema": self.ml_schema,
        "schemas": {},
    }

    # Include all domain schemas and the ML schema
    for schema_name in [*self.domain_schemas, self.ml_schema]:
        schema = self.model.schemas.get(schema_name)
        if not schema:
            continue

        schema_info = {"tables": {}}

        for table_name, table in schema.tables.items():
            # Get columns
            columns = []
            for col in table.columns:
                if not include_system_columns and col.name in system_columns:
                    continue
                columns.append(
                    {
                        "name": col.name,
                        "type": str(col.type.typename),
                        "nullok": col.nullok,
                        "comment": col.comment or "",
                    }
                )

            # Get foreign keys
            foreign_keys = []
            for fk in table.foreign_keys:
                fk_cols = [c.name for c in fk.foreign_key_columns]
                ref_cols = [c.name for c in fk.referenced_columns]
                foreign_keys.append(
                    {
                        "columns": fk_cols,
                        "referenced_table": f"{fk.pk_table.schema.name}.{fk.pk_table.name}",
                        "referenced_columns": ref_cols,
                    }
                )

            # Get features if this is a domain table
            features = []
            if self.is_domain_schema(schema_name):
                try:
                    for f in self.find_features(table):
                        features.append(
                            {
                                "name": f.feature_name,
                                "feature_table": f.feature_table.name,
                            }
                        )
                except Exception as e:
                    logger.debug(f"Could not enumerate features for table {table.name}: {e}")

            table_info = {
                "comment": table.comment or "",
                "is_vocabulary": self.is_vocabulary(table),
                "is_asset": self.is_asset(table),
                "is_association": bool(self.is_association(table)),
                "columns": columns,
                "foreign_keys": foreign_keys,
            }
            if features:
                table_info["features"] = features

            schema_info["tables"][table_name] = table_info

        result["schemas"][schema_name] = schema_info

    return result

is_asset

is_asset(table: TableInput) -> bool

Check whether table is a proper asset table.

Delegates to :meth:Table.is_asset from deriva-py, which verifies:

Required columns exist (URL, Filename, Length, MD5).
URL, Length, MD5 are NOT NULL.
URL carries the asset annotation.

Parameters:

Name	Type	Description	Default
`table`	`TableInput`	Table name or :class:`Table` to inspect.	required

Returns:

Type	Description
`bool`	True if all asset-table requirements are satisfied.

Raises:

Type	Description
`DerivaMLTableNotFound`	If `table` doesn't exist in any searchable schema (raised by :meth:`name_to_table`).

Example

model.is_asset("Image") # doctest: +SKIP True model.is_asset("Subject") # doctest: +SKIP False

Source code in src/deriva_ml/model/catalog.py

def is_asset(self, table: TableInput) -> bool:
    """Check whether ``table`` is a proper asset table.

    Delegates to :meth:`Table.is_asset` from deriva-py, which verifies:

    - Required columns exist (``URL``, ``Filename``, ``Length``, ``MD5``).
    - ``URL``, ``Length``, ``MD5`` are NOT NULL.
    - ``URL`` carries the ``asset`` annotation.

    Args:
        table: Table name or :class:`Table` to inspect.

    Returns:
        True if all asset-table requirements are satisfied.

    Raises:
        DerivaMLTableNotFound: If ``table`` doesn't exist in any searchable
            schema (raised by :meth:`name_to_table`).

    Example:
        >>> model.is_asset("Image")  # doctest: +SKIP
        True
        >>> model.is_asset("Subject")  # doctest: +SKIP
        False
    """
    table = self.name_to_table(table)
    return table.is_asset()

is_association

is_association(
    table: TableInput,
    unqualified: bool = True,
    pure: bool = True,
    min_arity: int = 2,
    max_arity: int = 2,
) -> bool | set[str] | int

Check whether table is an association (linking) table.

Delegates to :meth:deriva.core.ermrest_model.Table.is_association. An association table mediates a many-to-many relationship between two (or more) tables via outbound FKs to each end.

Parameters:

Name	Type	Description	Default
`table`	`TableInput`	Table name or :class:`Table` to inspect.	required
`unqualified`	`bool`	Per deriva-py — if True, the returned column set uses bare column names (no schema/table qualification). Only consulted when the return mode is the column-name set.	`True`
`pure`	`bool`	If True, require a pure association — no extra payload columns beyond the FK columns and system metadata (RID, RCT, RMT, RCB, RMB). Excludes feature tables, which carry their own non-FK columns.	`True`
`min_arity`	`int`	Minimum number of outbound FKs that count as "associating." Defaults to 2 (a binary association).	`2`
`max_arity`	`int`	Maximum number of outbound FKs. Defaults to 2.	`2`

Returns:

Name	Type	Description
	`bool \| set[str] \| int`	`bool` when the question is "is this any association at the
	`bool \| set[str] \| int`	requested arity," or `set[str]` / `int` when deriva-py's
	`bool \| set[str] \| int`	`is_association` returns the structural detail set instead.
`See`	`bool \| set[str] \| int`	meth:`Table.is_association` for the full contract.

Raises:

Type	Description
`DerivaMLTableNotFound`	If `table` doesn't exist in any searchable schema (raised by :meth:`name_to_table`).

Example

bool(model.is_association("Dataset_Image")) # doctest: +SKIP True bool(model.is_association("Image")) # doctest: +SKIP False

Source code in src/deriva_ml/model/catalog.py

def is_association(
    self,
    table: TableInput,
    unqualified: bool = True,
    pure: bool = True,
    min_arity: int = 2,
    max_arity: int = 2,
) -> bool | set[str] | int:
    """Check whether ``table`` is an association (linking) table.

    Delegates to :meth:`deriva.core.ermrest_model.Table.is_association`.
    An association table mediates a many-to-many relationship between
    two (or more) tables via outbound FKs to each end.

    Args:
        table: Table name or :class:`Table` to inspect.
        unqualified: Per deriva-py — if True, the returned column set
            uses bare column names (no schema/table qualification).
            Only consulted when the return mode is the column-name set.
        pure: If True, require a *pure* association — no extra payload
            columns beyond the FK columns and system metadata (RID,
            RCT, RMT, RCB, RMB). Excludes feature tables, which carry
            their own non-FK columns.
        min_arity: Minimum number of outbound FKs that count as
            "associating." Defaults to 2 (a binary association).
        max_arity: Maximum number of outbound FKs. Defaults to 2.

    Returns:
        ``bool`` when the question is "is this *any* association at the
        requested arity," or ``set[str]`` / ``int`` when deriva-py's
        ``is_association`` returns the structural detail set instead.
        See :meth:`Table.is_association` for the full contract.

    Raises:
        DerivaMLTableNotFound: If ``table`` doesn't exist in any searchable
            schema (raised by :meth:`name_to_table`).

    Example:
        >>> bool(model.is_association("Dataset_Image"))  # doctest: +SKIP
        True
        >>> bool(model.is_association("Image"))  # doctest: +SKIP
        False
    """
    table = self.name_to_table(table)
    return table.is_association(unqualified=unqualified, pure=pure, min_arity=min_arity, max_arity=max_arity)

is_dataset_rid

is_dataset_rid(
    rid: RID, deleted: bool = False
) -> bool

Check whether rid identifies a (non-deleted) Dataset row.

Resolves rid against the live catalog via :meth:ErmrestCatalog.resolve_rid to determine which table it belongs to, then verifies it's the Dataset table. By default deleted datasets are treated as not-a-dataset; pass deleted=True to include tombstoned rows in the positive set.

Parameters:

Name	Type	Description	Default
`rid`	`RID`	The RID to test.	required
`deleted`	`bool`	If True, return `True` for soft-deleted datasets too. Defaults to False (deleted rows return `False`).	`False`

Returns:

Type	Description
`bool`	True if `rid` is a Dataset row (filtered by the `deleted`
`bool`	flag), False if it points at a different table.

Raises:

Type	Description
`DerivaMLException`	If `rid` doesn't resolve in the catalog at all (typically an invalid or fabricated RID).

Example

model.is_dataset_rid("1-abc123") # doctest: +SKIP True model.is_dataset_rid("1-image01") # an Image RID # doctest: +SKIP False

Source code in src/deriva_ml/model/catalog.py

def is_dataset_rid(self, rid: RID, deleted: bool = False) -> bool:
    """Check whether ``rid`` identifies a (non-deleted) Dataset row.

    Resolves ``rid`` against the live catalog via
    :meth:`ErmrestCatalog.resolve_rid` to determine which table it
    belongs to, then verifies it's the ``Dataset`` table. By default
    deleted datasets are treated as not-a-dataset; pass ``deleted=True``
    to include tombstoned rows in the positive set.

    Args:
        rid: The RID to test.
        deleted: If True, return ``True`` for soft-deleted datasets
            too. Defaults to False (deleted rows return ``False``).

    Returns:
        True if ``rid`` is a Dataset row (filtered by the ``deleted``
        flag), False if it points at a different table.

    Raises:
        DerivaMLException: If ``rid`` doesn't resolve in the catalog
            at all (typically an invalid or fabricated RID).

    Example:
        >>> model.is_dataset_rid("1-abc123")  # doctest: +SKIP
        True
        >>> model.is_dataset_rid("1-image01")  # an Image RID  # doctest: +SKIP
        False
    """
    try:
        rid_info = self.model.catalog.resolve_rid(rid, self.model)
    except KeyError as _e:
        raise DerivaMLException(f"Invalid RID {rid}")
    if rid_info.table.name != "Dataset":
        return False
    elif deleted:
        # Got a dataset rid. Now check to see if its deleted or not.
        return True
    else:
        return not list(rid_info.datapath.entities().fetch())[0]["Deleted"]

is_domain_schema

is_domain_schema(
    schema_name: str,
) -> bool

Check if a schema is a domain schema.

Parameters:

Name	Type	Description	Default
`schema_name`	`str`	Name of the schema to check.	required

Returns:

Type	Description
`bool`	True if the schema is a domain schema.

Example

model.is_domain_schema("my_domain") # doctest: +SKIP True model.is_domain_schema("deriva-ml") # doctest: +SKIP False

Source code in src/deriva_ml/model/catalog.py

def is_domain_schema(self, schema_name: str) -> bool:
    """Check if a schema is a domain schema.

    Args:
        schema_name: Name of the schema to check.

    Returns:
        True if the schema is a domain schema.

    Example:
        >>> model.is_domain_schema("my_domain")  # doctest: +SKIP
        True
        >>> model.is_domain_schema("deriva-ml")  # doctest: +SKIP
        False
    """
    return schema_name in self.domain_schemas

is_system_schema

is_system_schema(
    schema_name: str,
) -> bool

Check if a schema is a system or ML schema.

Parameters:

Name	Type	Description	Default
`schema_name`	`str`	Name of the schema to check.	required

Returns:

Type	Description
`bool`	True if the schema is a system or ML schema.

Example

model.is_system_schema("public") # doctest: +SKIP True model.is_system_schema("my_domain") # doctest: +SKIP False

Source code in src/deriva_ml/model/catalog.py

def is_system_schema(self, schema_name: str) -> bool:
    """Check if a schema is a system or ML schema.

    Args:
        schema_name: Name of the schema to check.

    Returns:
        True if the schema is a system or ML schema.

    Example:
        >>> model.is_system_schema("public")  # doctest: +SKIP
        True
        >>> model.is_system_schema("my_domain")  # doctest: +SKIP
        False
    """
    return _is_system_schema(schema_name, self.ml_schema)

is_vocabulary

is_vocabulary(
    table: TableInput,
) -> bool

Check if a given table is a controlled vocabulary table.

Delegates to Table.is_vocabulary() in deriva-py, which enforces both the required column names AND their types (ermrest_curie, ermrest_uri, text, markdown). The type check is stricter than a column-name-only check — a table with an ID column of the wrong type correctly returns False here where the legacy name-only implementation would have returned True.

Mirrors :meth:is_asset, which already delegates to Table.is_asset().

Parameters:

Name	Type	Description	Default
`table`	`TableInput`	An ERMrest Table object or the name of the table.	required

Returns:

Type	Description
`bool`	True if the table has the structure of a controlled vocabulary,
`bool`	False otherwise.

Raises:

Type	Description
`DerivaMLTableNotFound`	If the table doesn't exist in any searchable schema (raised by :meth:`name_to_table`).

Example

model.is_vocabulary("Image_Class") # doctest: +SKIP True model.is_vocabulary("Image") # doctest: +SKIP False

Source code in src/deriva_ml/model/catalog.py

def is_vocabulary(self, table: TableInput) -> bool:
    """Check if a given table is a controlled vocabulary table.

    Delegates to ``Table.is_vocabulary()`` in deriva-py, which enforces both
    the required column names AND their types (ermrest_curie, ermrest_uri,
    text, markdown). The type check is stricter than a column-name-only
    check — a table with an ``ID`` column of the wrong type correctly
    returns False here where the legacy name-only implementation would
    have returned True.

    Mirrors :meth:`is_asset`, which already delegates to ``Table.is_asset()``.

    Args:
        table: An ERMrest Table object or the name of the table.

    Returns:
        True if the table has the structure of a controlled vocabulary,
        False otherwise.

    Raises:
        DerivaMLTableNotFound: If the table doesn't exist in any searchable
            schema (raised by :meth:`name_to_table`).

    Example:
        >>> model.is_vocabulary("Image_Class")  # doctest: +SKIP
        True
        >>> model.is_vocabulary("Image")  # doctest: +SKIP
        False
    """
    table = self.name_to_table(table)
    return table.is_vocabulary()

list_dataset_element_types

list_dataset_element_types() -> (
    list[Table]
)

List the deriva-py Table types that can be dataset members.

Walks Dataset.find_associations() and returns the other_fkey.pk_table for each association whose target is a domain-schema table or the Dataset table itself. Used by DerivaML.add_dataset_members to validate the kind of row a caller is trying to add to a dataset.

Returns:

Type	Description
`list[Table]`	A list of :class:`~deriva.core.ermrest_model.Table`
`list[Table]`	objects — one per valid member type.

Example

[t.name for t in model.list_dataset_element_types()] # doctest: +SKIP ['Image', 'Subject', 'Dataset']

Source code in src/deriva_ml/model/catalog.py

def list_dataset_element_types(self) -> list[Table]:
    """List the deriva-py ``Table`` types that can be dataset members.

    Walks ``Dataset.find_associations()`` and returns the
    ``other_fkey.pk_table`` for each association whose target is a
    domain-schema table or the Dataset table itself. Used by
    ``DerivaML.add_dataset_members`` to validate the kind of row
    a caller is trying to add to a dataset.

    Returns:
        A list of :class:`~deriva.core.ermrest_model.Table`
        objects — one per valid member type.

    Example:
        >>> [t.name for t in model.list_dataset_element_types()]  # doctest: +SKIP
        ['Image', 'Subject', 'Dataset']
    """

    dataset_table = self.name_to_table("Dataset")

    def is_domain_or_dataset_table(table: Table) -> bool:
        return self.is_domain_schema(table.schema.name) or table.name == dataset_table.name

    return [
        t
        for a in dataset_table.find_associations()
        if is_domain_or_dataset_table(t := a.other_fkeys.pop().pk_table)
    ]

lookup_feature

lookup_feature(
    table: TableInput, feature_name: str
) -> Feature

Look up the named feature on table.

Features are association tables (linking a target table to vocabulary terms, assets, and metadata) discovered by :meth:find_features. This is the by-name accessor.

Parameters:

Name	Type	Description	Default
`table`	`TableInput`	The target table the feature is attached to. Name or :class:`Table`.	required
`feature_name`	`str`	The feature's name as set in its `Feature_Name` column.	required

Returns:

Name	Type	Description
`The`	`Feature`	class:`Feature` wrapper for the matching association.

Raises:

Type	Description
`DerivaMLTableNotFound`	If `table` doesn't exist.
`DerivaMLFeatureNotFound`	If no feature with `feature_name` is defined on `table`.

Example

feature = model.lookup_feature("Image", "Quality") # doctest: +SKIP feature.feature_name # doctest: +SKIP 'Quality'

Source code in src/deriva_ml/model/catalog.py

def lookup_feature(self, table: TableInput, feature_name: str) -> Feature:
    """Look up the named feature on ``table``.

    Features are association tables (linking a target table to
    vocabulary terms, assets, and metadata) discovered by
    :meth:`find_features`. This is the by-name accessor.

    Args:
        table: The target table the feature is attached to. Name or
            :class:`Table`.
        feature_name: The feature's name as set in its
            ``Feature_Name`` column.

    Returns:
        The :class:`Feature` wrapper for the matching association.

    Raises:
        DerivaMLTableNotFound: If ``table`` doesn't exist.
        DerivaMLFeatureNotFound: If no feature with
            ``feature_name`` is defined on ``table``.

    Example:
        >>> feature = model.lookup_feature("Image", "Quality")  # doctest: +SKIP
        >>> feature.feature_name  # doctest: +SKIP
        'Quality'
    """
    table = self.name_to_table(table)
    try:
        return [f for f in self.find_features(table) if f.feature_name == feature_name][0]
    except IndexError:
        raise DerivaMLFeatureNotFound(table.name, feature_name) from None

name_to_table

name_to_table(
    table: TableInput,
) -> Table

Return the table object corresponding to the given table name.

Searches domain schemas first (in sorted order), then ML schema, then WWW. If the table name appears in more than one schema, returns the first match.

Parameters:

Name	Type	Description	Default
`table`	`TableInput`	A ERMRest table object or a string that is the name of the table.	required

Returns:

Type	Description
`Table`	Table object.

Raises:

Type	Description
`DerivaMLTableNotFound`	If the table doesn't exist in any searchable schema.

Example

image = model.name_to_table("Image") # doctest: +SKIP image.name # doctest: +SKIP 'Image'

Source code in src/deriva_ml/model/catalog.py

def name_to_table(self, table: TableInput) -> Table:
    """Return the table object corresponding to the given table name.

    Searches domain schemas first (in sorted order), then ML schema, then WWW.
    If the table name appears in more than one schema, returns the first match.

    Args:
      table: A ERMRest table object or a string that is the name of the table.

    Returns:
      Table object.

    Raises:
      DerivaMLTableNotFound: If the table doesn't exist in any searchable schema.

    Example:
        >>> image = model.name_to_table("Image")  # doctest: +SKIP
        >>> image.name  # doctest: +SKIP
        'Image'
    """
    if isinstance(table, Table):
        return table

    # Search domain schemas (sorted for deterministic order), then ML schema, then WWW
    search_order = [*sorted(self.domain_schemas), self.ml_schema, "WWW"]
    for sname in search_order:
        if sname not in self.model.schemas:
            continue
        s = self.model.schemas[sname]
        if table in s.tables:
            return s.tables[table]
    raise DerivaMLTableNotFound(str(table), msg="Table doesn't exist in any searchable schema")

refresh_model

refresh_model() -> None

Re-fetch the catalog model and replace self.model in place.

Calls catalog.getCatalogModel() and rebinds the result to self.model. Use this after a schema change (new table, column, or annotation) so subsequent introspection sees the current model.

Caching note: the asset-execution-table cache (_asset_execution_tables_cache) is keyed on the identity of self.model, so swapping the model out automatically invalidates it — the next call recomputes. The denormalize-planner cache (_planner_cache), if already built, keeps a reference to the previous model; if you depend on the planner reflecting a just-applied schema change, rebuild the instance rather than relying on refresh_model alone.

Returns:

Type	Description
`None`	None. Mutates `self.model` as a side effect.

Example

ml.create_vocabulary("Severity", "Lesion grade") # doctest: +SKIP ml.refresh_model() # pick up the new table # doctest: +SKIP

Source code in src/deriva_ml/model/catalog.py

def refresh_model(self) -> None:
    """Re-fetch the catalog model and replace ``self.model`` in place.

    Calls ``catalog.getCatalogModel()`` and rebinds the result to
    ``self.model``. Use this after a schema change (new table, column,
    or annotation) so subsequent introspection sees the current model.

    Caching note: the asset-execution-table cache
    (``_asset_execution_tables_cache``) is keyed on the *identity* of
    ``self.model``, so swapping the model out automatically invalidates it
    — the next call recomputes. The denormalize-planner cache
    (``_planner_cache``), if already built, keeps a reference to the
    previous model; if you depend on the planner reflecting a just-applied
    schema change, rebuild the instance rather than relying on
    ``refresh_model`` alone.

    Returns:
        None. Mutates ``self.model`` as a side effect.

    Example:
        >>> ml.create_vocabulary("Severity", "Lesion grade")  # doctest: +SKIP
        >>> ml.refresh_model()  # pick up the new table  # doctest: +SKIP
    """
    self.model = self.catalog.getCatalogModel()

vocab_columns

vocab_columns(
    table: TableInput,
) -> dict[str, str]

Return mapping from canonical vocab column name to actual column name.

Canonical names are TitleCase (Name, ID, URI, Description, Synonyms). Actual names reflect the table's schema — could be lowercase for FaceBase-style catalogs or TitleCase for DerivaML-native tables.

Parameters:

Name	Type	Description	Default
`table`	`TableInput`	A table object or the name of the table.	required

Returns:

Type	Description
`dict[str, str]`	Dict mapping canonical name to actual column name in the table.
`dict[str, str]`	E.g. `{"Name": "name", "ID": "id", ...}` for FaceBase tables
`dict[str, str]`	or `{"Name": "Name", "ID": "ID", ...}` for DerivaML tables.

Raises:

Type	Description
`DerivaMLTableNotFound`	If the table doesn't exist (raised by :meth:`name_to_table`).

Example

model.vocab_columns("Image_Class") # doctest: +SKIP {'Name': 'Name', 'ID': 'ID', 'URI': 'URI', 'Description': 'Description', 'Synonyms': 'Synonyms'}

Source code in src/deriva_ml/model/catalog.py

def vocab_columns(self, table: TableInput) -> dict[str, str]:
    """Return mapping from canonical vocab column name to actual column name.

    Canonical names are TitleCase (Name, ID, URI, Description, Synonyms).
    Actual names reflect the table's schema — could be lowercase for
    FaceBase-style catalogs or TitleCase for DerivaML-native tables.

    Args:
        table: A table object or the name of the table.

    Returns:
        Dict mapping canonical name to actual column name in the table.
        E.g. ``{"Name": "name", "ID": "id", ...}`` for FaceBase tables
        or ``{"Name": "Name", "ID": "ID", ...}`` for DerivaML tables.

    Raises:
        DerivaMLTableNotFound: If the table doesn't exist (raised by
            :meth:`name_to_table`).

    Example:
        >>> model.vocab_columns("Image_Class")  # doctest: +SKIP
        {'Name': 'Name', 'ID': 'ID', 'URI': 'URI', 'Description': 'Description', 'Synonyms': 'Synonyms'}
    """
    table = self.name_to_table(table)
    col_map = {c.name.upper(): c.name for c in table.columns}
    return {canon: col_map[canon.upper()] for canon in ("Name", "ID", "URI", "Description", "Synonyms")}

Display `dataclass`

Bases: AnnotationBuilder

Display annotation for tables and columns.

Controls the display name, description/tooltip, and how null values and foreign key links are rendered. Can be applied to both tables and columns.

Parameters:

Name	Type	Description	Default
`name`	`str \| None`	Display name shown in the UI (mutually exclusive with markdown_name)	`None`
`markdown_name`	`str \| None`	Markdown-formatted display name (mutually exclusive with name)	`None`
`name_style`	`NameStyle \| None`	Styling options for automatic name formatting	`None`
`comment`	`str \| None`	Description text shown as tooltip/help text	`None`
`show_null`	`dict[str, bool \| str] \| None`	How to display null values, per context	`None`
`show_foreign_key_link`	`dict[str, bool] \| None`	Whether to show FK values as links, per context	`None`

Raises:

Type	Description
`ValueError`	If both name and markdown_name are provided

Example

Build the annotation, then stage it on the table and push to the catalog (the apply path is the same for every builder — table.annotations[Builder.tag] = builder.to_dict() followed by ml.apply_annotations())::

>>> display = Display(name="Research Subjects")  # doctest: +SKIP
>>> table.annotations[Display.tag] = display.to_dict()  # doctest: +SKIP
>>> ml.apply_annotations()  # doctest: +SKIP

With description/tooltip::

>>> display = Display(  # doctest: +SKIP
...     name="Subjects",
...     comment="Individuals enrolled in research studies"
... )

Markdown-formatted name::

>>> display = Display(markdown_name="**Bold** _Italic_ Name")  # doctest: +SKIP

Context-specific null display::

>>> from deriva_ml.model import CONTEXT_COMPACT, CONTEXT_DETAILED  # doctest: +SKIP
>>> display = Display(  # doctest: +SKIP
...     name="Value",
...     show_null={
...         CONTEXT_COMPACT: False,      # Hide nulls in lists
...         CONTEXT_DETAILED: '"N/A"'    # Show "N/A" string
...     }
... )

Control foreign key link display::

>>> display = Display(  # doctest: +SKIP
...     name="Subject",
...     show_foreign_key_link={CONTEXT_COMPACT: False}
... )

Source code in src/deriva_ml/model/annotations.py

@dataclass
class Display(AnnotationBuilder):
    """Display annotation for tables and columns.

    Controls the display name, description/tooltip, and how null values
    and foreign key links are rendered. Can be applied to both tables
    and columns.

    Args:
        name: Display name shown in the UI (mutually exclusive with markdown_name)
        markdown_name: Markdown-formatted display name (mutually exclusive with name)
        name_style: Styling options for automatic name formatting
        comment: Description text shown as tooltip/help text
        show_null: How to display null values, per context
        show_foreign_key_link: Whether to show FK values as links, per context

    Raises:
        ValueError: If both name and markdown_name are provided

    Example:
        Build the annotation, then stage it on the table and push to
        the catalog (the apply path is the same for every builder —
        ``table.annotations[Builder.tag] = builder.to_dict()`` followed
        by ``ml.apply_annotations()``)::

            >>> display = Display(name="Research Subjects")  # doctest: +SKIP
            >>> table.annotations[Display.tag] = display.to_dict()  # doctest: +SKIP
            >>> ml.apply_annotations()  # doctest: +SKIP

        With description/tooltip::

            >>> display = Display(  # doctest: +SKIP
            ...     name="Subjects",
            ...     comment="Individuals enrolled in research studies"
            ... )

        Markdown-formatted name::

            >>> display = Display(markdown_name="**Bold** _Italic_ Name")  # doctest: +SKIP

        Context-specific null display::

            >>> from deriva_ml.model import CONTEXT_COMPACT, CONTEXT_DETAILED  # doctest: +SKIP
            >>> display = Display(  # doctest: +SKIP
            ...     name="Value",
            ...     show_null={
            ...         CONTEXT_COMPACT: False,      # Hide nulls in lists
            ...         CONTEXT_DETAILED: '"N/A"'    # Show "N/A" string
            ...     }
            ... )

        Control foreign key link display::

            >>> display = Display(  # doctest: +SKIP
            ...     name="Subject",
            ...     show_foreign_key_link={CONTEXT_COMPACT: False}
            ... )
    """

    tag = TAG_DISPLAY

    name: str | None = None
    markdown_name: str | None = None
    name_style: NameStyle | None = None
    comment: str | None = None
    show_null: dict[str, bool | str] | None = None
    show_foreign_key_link: dict[str, bool] | None = None

    def __post_init__(self):
        if self.name and self.markdown_name:
            raise ValueError("name and markdown_name are mutually exclusive")

    def to_dict(self) -> dict[str, Any]:
        result = {}
        if self.name is not None:
            result["name"] = self.name
        if self.markdown_name is not None:
            result["markdown_name"] = self.markdown_name
        if self.name_style is not None:
            style_dict = self.name_style.to_dict()
            if style_dict:
                result["name_style"] = style_dict
        if self.comment is not None:
            result["comment"] = self.comment
        if self.show_null is not None:
            result["show_null"] = self.show_null
        if self.show_foreign_key_link is not None:
            result["show_foreign_key_link"] = self.show_foreign_key_link
        return result

A facet definition for filtering.

Parameters:

Name	Type	Description	Default
`source`	`str \| list[str \| InboundFK \| OutboundFK] \| None`	Path to source data	`None`
`sourcekey`	`str \| None`	Reference to named source	`None`
`markdown_name`	`str \| None`	Display name	`None`
`comment`	`str \| None`	Description	`None`
`entity`	`bool \| None`	Whether this is an entity facet	`None`
`open`	`bool \| None`	Start expanded	`None`
`ux_mode`	`FacetUxMode \| None`	UI mode (choices, ranges, check_presence)	`None`
`bar_plot`	`bool \| None`	Show bar plot	`None`
`choices`	`list[Any] \| None`	Preset choice values	`None`
`ranges`	`list[FacetRange] \| None`	Preset range values	`None`
`not_null`	`bool \| None`	Filter to non-null values	`None`
`hide_null_choice`	`bool \| None`	Hide "null" option	`None`
`hide_not_null_choice`	`bool \| None`	Hide "not null" option	`None`
`n_bins`	`int \| None`	Number of bins for histogram	`None`

Source code in src/deriva_ml/model/annotations.py

@dataclass
class Facet:
    """A facet definition for filtering.

    Args:
        source: Path to source data
        sourcekey: Reference to named source
        markdown_name: Display name
        comment: Description
        entity: Whether this is an entity facet
        open: Start expanded
        ux_mode: UI mode (choices, ranges, check_presence)
        bar_plot: Show bar plot
        choices: Preset choice values
        ranges: Preset range values
        not_null: Filter to non-null values
        hide_null_choice: Hide "null" option
        hide_not_null_choice: Hide "not null" option
        n_bins: Number of bins for histogram
    """

    source: str | list[str | InboundFK | OutboundFK] | None = None
    sourcekey: str | None = None
    markdown_name: str | None = None
    comment: str | None = None
    entity: bool | None = None
    open: bool | None = None
    ux_mode: FacetUxMode | None = None
    bar_plot: bool | None = None
    choices: list[Any] | None = None
    ranges: list[FacetRange] | None = None
    not_null: bool | None = None
    hide_null_choice: bool | None = None
    hide_not_null_choice: bool | None = None
    n_bins: int | None = None

    def to_dict(self) -> dict[str, Any]:
        result = {}

        if self.source is not None:
            if isinstance(self.source, str):
                result["source"] = self.source
            else:
                result["source"] = [item.to_dict() if hasattr(item, "to_dict") else item for item in self.source]

        if self.sourcekey is not None:
            result["sourcekey"] = self.sourcekey
        if self.markdown_name is not None:
            result["markdown_name"] = self.markdown_name
        if self.comment is not None:
            result["comment"] = self.comment
        if self.entity is not None:
            result["entity"] = self.entity
        if self.open is not None:
            result["open"] = self.open
        if self.ux_mode is not None:
            result["ux_mode"] = self.ux_mode.value
        if self.bar_plot is not None:
            result["bar_plot"] = self.bar_plot
        if self.choices is not None:
            result["choices"] = self.choices
        if self.ranges is not None:
            result["ranges"] = [r.to_dict() for r in self.ranges]
        if self.not_null is not None:
            result["not_null"] = self.not_null
        if self.hide_null_choice is not None:
            result["hide_null_choice"] = self.hide_null_choice
        if self.hide_not_null_choice is not None:
            result["hide_not_null_choice"] = self.hide_not_null_choice
        if self.n_bins is not None:
            result["n_bins"] = self.n_bins

        return result

A list of facets for filtering (visible_columns.filter).

Example

facets = FacetList([ # doctest: +SKIP ... Facet(source="Species", open=True), ... Facet(source="Age", ux_mode=FacetUxMode.RANGES) ... ])

Source code in src/deriva_ml/model/annotations.py

@dataclass
class FacetList:
    """A list of facets for filtering (visible_columns.filter).

    Example:
        >>> facets = FacetList([  # doctest: +SKIP
        ...     Facet(source="Species", open=True),
        ...     Facet(source="Age", ux_mode=FacetUxMode.RANGES)
        ... ])
    """

    facets: list[Facet] = field(default_factory=list)

    def add(self, facet: Facet) -> "FacetList":
        """Add a facet to the list."""
        self.facets.append(facet)
        return self

    def to_dict(self) -> dict[str, list[dict]]:
        return {"and": [f.to_dict() for f in self.facets]}

add(facet: Facet) -> 'FacetList'

Add a facet to the list.

Source code in src/deriva_ml/model/annotations.py

def add(self, facet: Facet) -> "FacetList":
    """Add a facet to the list."""
    self.facets.append(facet)
    return self

FacetRange `dataclass`

A range for facet filtering.

Parameters:

Name	Type	Description	Default
`min`	`float \| None`	Minimum value	`None`
`max`	`float \| None`	Maximum value	`None`
`min_exclusive`	`bool \| None`	Exclude min value	`None`
`max_exclusive`	`bool \| None`	Exclude max value	`None`

Source code in src/deriva_ml/model/annotations.py

@dataclass
class FacetRange:
    """A range for facet filtering.

    Args:
        min: Minimum value
        max: Maximum value
        min_exclusive: Exclude min value
        max_exclusive: Exclude max value
    """

    min: float | None = None
    max: float | None = None
    min_exclusive: bool | None = None
    max_exclusive: bool | None = None

    def to_dict(self) -> dict[str, Any]:
        result = {}
        if self.min is not None:
            result["min"] = self.min
        if self.max is not None:
            result["max"] = self.max
        if self.min_exclusive is not None:
            result["min_exclusive"] = self.min_exclusive
        if self.max_exclusive is not None:
            result["max_exclusive"] = self.max_exclusive
        return result

FacetUxMode

Bases: str, Enum

UX modes for facet filters in the search panel.

Controls how users interact with a facet filter.

Attributes:

Name	Type	Description
`CHOICES`		Checkbox list for selecting values
`RANGES`		Range slider/inputs for numeric or date ranges
`CHECK_PRESENCE`		Check if value exists or is null

Example

Choice-based facet

Facet(source="Status", ux_mode=FacetUxMode.CHOICES) # doctest: +SKIP

Range-based facet for numeric values

Facet(source="Age", ux_mode=FacetUxMode.RANGES) # doctest: +SKIP

Check presence (has value / no value)

Facet(source="Notes", ux_mode=FacetUxMode.CHECK_PRESENCE) # doctest: +SKIP

Source code in src/deriva_ml/model/annotations.py

class FacetUxMode(str, Enum):
    """UX modes for facet filters in the search panel.

    Controls how users interact with a facet filter.

    Attributes:
        CHOICES: Checkbox list for selecting values
        RANGES: Range slider/inputs for numeric or date ranges
        CHECK_PRESENCE: Check if value exists or is null

    Example:
        >>> # Choice-based facet
        >>> Facet(source="Status", ux_mode=FacetUxMode.CHOICES)  # doctest: +SKIP
        >>>
        >>> # Range-based facet for numeric values
        >>> Facet(source="Age", ux_mode=FacetUxMode.RANGES)  # doctest: +SKIP
        >>>
        >>> # Check presence (has value / no value)
        >>> Facet(source="Notes", ux_mode=FacetUxMode.CHECK_PRESENCE)  # doctest: +SKIP
    """

    CHOICES = "choices"
    RANGES = "ranges"
    CHECK_PRESENCE = "check_presence"

InboundFK `dataclass`

An inbound foreign key path step for pseudo-column source paths.

Use this when following a foreign key FROM another table TO the current table. This is common when counting or aggregating related records.

Parameters:

Name	Type	Description	Default
`schema`	`str`	Schema name containing the FK constraint	required
`constraint`	`str`	Foreign key constraint name	required

Example

Count images related to a subject (Image has FK to Subject)::

>>> # In Subject table, count related images
>>> pc = PseudoColumn(  # doctest: +SKIP
...     source=[InboundFK("domain", "Image_Subject_fkey"), "RID"],
...     aggregate=Aggregate.CNT,
...     markdown_name="Image Count"
... )

Source code in src/deriva_ml/model/annotations.py

@dataclass
class InboundFK:
    """An inbound foreign key path step for pseudo-column source paths.

    Use this when following a foreign key FROM another table TO the current table.
    This is common when counting or aggregating related records.

    Args:
        schema: Schema name containing the FK constraint
        constraint: Foreign key constraint name

    Example:
        Count images related to a subject (Image has FK to Subject)::

            >>> # In Subject table, count related images
            >>> pc = PseudoColumn(  # doctest: +SKIP
            ...     source=[InboundFK("domain", "Image_Subject_fkey"), "RID"],
            ...     aggregate=Aggregate.CNT,
            ...     markdown_name="Image Count"
            ... )
    """

    schema: str
    constraint: str

    def to_dict(self) -> dict[str, list[str]]:
        return {"inbound": [self.schema, self.constraint]}

NameStyle `dataclass`

Styling options for automatic display name formatting.

Applied to table or column names when no explicit display name is set.

Parameters:

Name	Type	Description	Default
`underline_space`	`bool \| None`	Replace underscores with spaces (e.g., "First_Name" -> "First Name")	`None`
`title_case`	`bool \| None`	Apply title case formatting (e.g., "firstname" -> "Firstname")	`None`
`markdown`	`bool \| None`	Render the name as markdown	`None`

Example

Transform "Subject_ID" to "Subject Id" with title case

display = Display( # doctest: +SKIP ... name_style=NameStyle(underline_space=True, title_case=True) ... )

Source code in src/deriva_ml/model/annotations.py

@dataclass
class NameStyle:
    """Styling options for automatic display name formatting.

    Applied to table or column names when no explicit display name is set.

    Args:
        underline_space: Replace underscores with spaces (e.g., "First_Name" -> "First Name")
        title_case: Apply title case formatting (e.g., "firstname" -> "Firstname")
        markdown: Render the name as markdown

    Example:
        >>> # Transform "Subject_ID" to "Subject Id" with title case
        >>> display = Display(  # doctest: +SKIP
        ...     name_style=NameStyle(underline_space=True, title_case=True)
        ... )
    """

    underline_space: bool | None = None
    title_case: bool | None = None
    markdown: bool | None = None

    def to_dict(self) -> dict[str, bool]:
        """Convert to dictionary, excluding None values."""
        result = {}
        if self.underline_space is not None:
            result["underline_space"] = self.underline_space
        if self.title_case is not None:
            result["title_case"] = self.title_case
        if self.markdown is not None:
            result["markdown"] = self.markdown
        return result

to_dict

to_dict() -> dict[str, bool]

Convert to dictionary, excluding None values.

Source code in src/deriva_ml/model/annotations.py

def to_dict(self) -> dict[str, bool]:
    """Convert to dictionary, excluding None values."""
    result = {}
    if self.underline_space is not None:
        result["underline_space"] = self.underline_space
    if self.title_case is not None:
        result["title_case"] = self.title_case
    if self.markdown is not None:
        result["markdown"] = self.markdown
    return result

OutboundFK `dataclass`

An outbound foreign key path step for pseudo-column source paths.

Use this when following a foreign key FROM the current table TO another table. This is common when displaying values from referenced tables.

Parameters:

Name	Type	Description	Default
`schema`	`str`	Schema name containing the FK constraint	required
`constraint`	`str`	Foreign key constraint name	required

Example

Show species name from a related Species table::

>>> # Subject has FK to Species, display Species.Name
>>> pc = PseudoColumn(  # doctest: +SKIP
...     source=[OutboundFK("domain", "Subject_Species_fkey"), "Name"],
...     markdown_name="Species"
... )

Chain multiple outbound FKs::

>>> # Image -> Subject -> Species
>>> pc = PseudoColumn(  # doctest: +SKIP
...     source=[
...         OutboundFK("domain", "Image_Subject_fkey"),
...         OutboundFK("domain", "Subject_Species_fkey"),
...         "Name"
...     ],
...     markdown_name="Species"
... )

Source code in src/deriva_ml/model/annotations.py

@dataclass
class OutboundFK:
    """An outbound foreign key path step for pseudo-column source paths.

    Use this when following a foreign key FROM the current table TO another table.
    This is common when displaying values from referenced tables.

    Args:
        schema: Schema name containing the FK constraint
        constraint: Foreign key constraint name

    Example:
        Show species name from a related Species table::

            >>> # Subject has FK to Species, display Species.Name
            >>> pc = PseudoColumn(  # doctest: +SKIP
            ...     source=[OutboundFK("domain", "Subject_Species_fkey"), "Name"],
            ...     markdown_name="Species"
            ... )

        Chain multiple outbound FKs::

            >>> # Image -> Subject -> Species
            >>> pc = PseudoColumn(  # doctest: +SKIP
            ...     source=[
            ...         OutboundFK("domain", "Image_Subject_fkey"),
            ...         OutboundFK("domain", "Subject_Species_fkey"),
            ...         "Name"
            ...     ],
            ...     markdown_name="Species"
            ... )
    """

    schema: str
    constraint: str

    def to_dict(self) -> dict[str, list[str]]:
        return {"outbound": [self.schema, self.constraint]}

PreFormat `dataclass`

Pre-formatting options for column values.

Parameters:

Name	Type	Description	Default
`format`	`str \| None`	Printf-style format string (e.g., "%.2f")	`None`
`bool_true_value`	`str \| None`	Display value for True	`None`
`bool_false_value`	`str \| None`	Display value for False	`None`

Source code in src/deriva_ml/model/annotations.py

@dataclass
class PreFormat:
    """Pre-formatting options for column values.

    Args:
        format: Printf-style format string (e.g., "%.2f")
        bool_true_value: Display value for True
        bool_false_value: Display value for False
    """

    format: str | None = None
    bool_true_value: str | None = None
    bool_false_value: str | None = None

    def to_dict(self) -> dict[str, Any]:
        result = {}
        if self.format is not None:
            result["format"] = self.format
        if self.bool_true_value is not None:
            result["bool_true_value"] = self.bool_true_value
        if self.bool_false_value is not None:
            result["bool_false_value"] = self.bool_false_value
        return result

PseudoColumn `dataclass`

A pseudo-column definition for visible columns and foreign keys.

Pseudo-columns display computed values, values from related tables, or custom markdown patterns. They appear as columns in table views but are not actual database columns.

Parameters:

Name	Type	Description	Default
`source`	`str \| list[str \| InboundFK \| OutboundFK] \| None`	Path to source data. Can be: - A column name (string) - A list of FK path steps ending with a column name	`None`
`sourcekey`	`str \| None`	Reference to a named source in source-definitions annotation	`None`
`markdown_name`	`str \| None`	Display name for the column (supports markdown)	`None`
`comment`	`str \| Literal[False] \| None`	Description/tooltip text (or False to hide)	`None`
`entity`	`bool \| None`	Whether this represents an entity (affects rendering)	`None`
`aggregate`	`Aggregate \| None`	Aggregation function when source returns multiple values	`None`
`self_link`	`bool \| None`	Make the value a link to the current row	`None`
`display`	`PseudoColumnDisplay \| None`	Display formatting options	`None`
`array_options`	`dict[str, Any] \| None`	Options for array aggregates (max_length, order)	`None`

Note

source and sourcekey are mutually exclusive. Use source for inline definitions, sourcekey to reference pre-defined sources.

Raises:

Type	Description
`ValueError`	If both source and sourcekey are provided

Example

Simple column with custom display name::

>>> PseudoColumn(source="Internal_ID", markdown_name="ID")  # doctest: +SKIP

Outbound FK traversal (display value from referenced table)::

>>> # Subject has FK to Species - show Species.Name
>>> PseudoColumn(  # doctest: +SKIP
...     source=[OutboundFK("domain", "Subject_Species_fkey"), "Name"],
...     markdown_name="Species"
... )

Inbound FK with aggregation (count related records)::

>>> # Count images pointing to this subject
>>> PseudoColumn(  # doctest: +SKIP
...     source=[InboundFK("domain", "Image_Subject_fkey"), "RID"],
...     aggregate=Aggregate.CNT,
...     markdown_name="Images"
... )

Multi-hop FK path::

>>> # Image -> Subject -> Species
>>> PseudoColumn(  # doctest: +SKIP
...     source=[
...         OutboundFK("domain", "Image_Subject_fkey"),
...         OutboundFK("domain", "Subject_Species_fkey"),
...         "Name"
...     ],
...     markdown_name="Species"
... )

With custom display formatting::

>>> PseudoColumn(  # doctest: +SKIP
...     source="URL",
...     display=PseudoColumnDisplay(
...         markdown_pattern="[Download]({{{_value}}})",
...         show_foreign_key_link=False
...     )
... )

Array aggregate with display options::

>>> PseudoColumn(  # doctest: +SKIP
...     source=[InboundFK("domain", "Tag_Item_fkey"), "Name"],
...     aggregate=Aggregate.ARRAY_D,
...     display=PseudoColumnDisplay(array_ux_mode=ArrayUxMode.CSV),
...     markdown_name="Tags"
... )

Source code in src/deriva_ml/model/annotations.py

@dataclass
class PseudoColumn:
    """A pseudo-column definition for visible columns and foreign keys.

    Pseudo-columns display computed values, values from related tables,
    or custom markdown patterns. They appear as columns in table views
    but are not actual database columns.

    Args:
        source: Path to source data. Can be:
            - A column name (string)
            - A list of FK path steps ending with a column name
        sourcekey: Reference to a named source in source-definitions annotation
        markdown_name: Display name for the column (supports markdown)
        comment: Description/tooltip text (or False to hide)
        entity: Whether this represents an entity (affects rendering)
        aggregate: Aggregation function when source returns multiple values
        self_link: Make the value a link to the current row
        display: Display formatting options
        array_options: Options for array aggregates (max_length, order)

    Note:
        source and sourcekey are mutually exclusive. Use source for inline
        definitions, sourcekey to reference pre-defined sources.

    Raises:
        ValueError: If both source and sourcekey are provided

    Example:
        Simple column with custom display name::

            >>> PseudoColumn(source="Internal_ID", markdown_name="ID")  # doctest: +SKIP

        Outbound FK traversal (display value from referenced table)::

            >>> # Subject has FK to Species - show Species.Name
            >>> PseudoColumn(  # doctest: +SKIP
            ...     source=[OutboundFK("domain", "Subject_Species_fkey"), "Name"],
            ...     markdown_name="Species"
            ... )

        Inbound FK with aggregation (count related records)::

            >>> # Count images pointing to this subject
            >>> PseudoColumn(  # doctest: +SKIP
            ...     source=[InboundFK("domain", "Image_Subject_fkey"), "RID"],
            ...     aggregate=Aggregate.CNT,
            ...     markdown_name="Images"
            ... )

        Multi-hop FK path::

            >>> # Image -> Subject -> Species
            >>> PseudoColumn(  # doctest: +SKIP
            ...     source=[
            ...         OutboundFK("domain", "Image_Subject_fkey"),
            ...         OutboundFK("domain", "Subject_Species_fkey"),
            ...         "Name"
            ...     ],
            ...     markdown_name="Species"
            ... )

        With custom display formatting::

            >>> PseudoColumn(  # doctest: +SKIP
            ...     source="URL",
            ...     display=PseudoColumnDisplay(
            ...         markdown_pattern="[Download]({{{_value}}})",
            ...         show_foreign_key_link=False
            ...     )
            ... )

        Array aggregate with display options::

            >>> PseudoColumn(  # doctest: +SKIP
            ...     source=[InboundFK("domain", "Tag_Item_fkey"), "Name"],
            ...     aggregate=Aggregate.ARRAY_D,
            ...     display=PseudoColumnDisplay(array_ux_mode=ArrayUxMode.CSV),
            ...     markdown_name="Tags"
            ... )
    """

    source: str | list[str | InboundFK | OutboundFK] | None = None
    sourcekey: str | None = None
    markdown_name: str | None = None
    comment: str | Literal[False] | None = None
    entity: bool | None = None
    aggregate: Aggregate | None = None
    self_link: bool | None = None
    display: PseudoColumnDisplay | None = None
    array_options: dict[str, Any] | None = None  # Can be complex

    def __post_init__(self):
        if self.source is not None and self.sourcekey is not None:
            raise ValueError("source and sourcekey are mutually exclusive")

    def to_dict(self) -> dict[str, Any]:
        result = {}

        if self.source is not None:
            if isinstance(self.source, str):
                result["source"] = self.source
            else:
                # Convert path elements
                result["source"] = [item.to_dict() if hasattr(item, "to_dict") else item for item in self.source]

        if self.sourcekey is not None:
            result["sourcekey"] = self.sourcekey
        if self.markdown_name is not None:
            result["markdown_name"] = self.markdown_name
        if self.comment is not None:
            result["comment"] = self.comment
        if self.entity is not None:
            result["entity"] = self.entity
        if self.aggregate is not None:
            result["aggregate"] = self.aggregate.value
        if self.self_link is not None:
            result["self_link"] = self.self_link
        if self.display is not None:
            result["display"] = self.display.to_dict()
        if self.array_options is not None:
            result["array_options"] = self.array_options

        return result

PseudoColumnDisplay `dataclass`

Display options for a pseudo-column.

Parameters:

Name	Type	Description	Default
`markdown_pattern`	`str \| None`	Handlebars/mustache template	`None`
`template_engine`	`TemplateEngine \| None`	Template engine to use	`None`
`show_foreign_key_link`	`bool \| None`	Show as clickable link	`None`
`array_ux_mode`	`ArrayUxMode \| None`	How to render array values	`None`
`column_order`	`list[SortKey] \| Literal[False] \| None`	Sort order for the column, or False to disable	`None`
`wait_for`	`list[str] \| None`	Template variables to wait for before rendering	`None`

Source code in src/deriva_ml/model/annotations.py

@dataclass
class PseudoColumnDisplay:
    """Display options for a pseudo-column.

    Args:
        markdown_pattern: Handlebars/mustache template
        template_engine: Template engine to use
        show_foreign_key_link: Show as clickable link
        array_ux_mode: How to render array values
        column_order: Sort order for the column, or False to disable
        wait_for: Template variables to wait for before rendering
    """

    markdown_pattern: str | None = None
    template_engine: TemplateEngine | None = None
    show_foreign_key_link: bool | None = None
    array_ux_mode: ArrayUxMode | None = None
    column_order: list[SortKey] | Literal[False] | None = None
    wait_for: list[str] | None = None

    def to_dict(self) -> dict[str, Any]:
        result = {}
        if self.markdown_pattern is not None:
            result["markdown_pattern"] = self.markdown_pattern
        if self.template_engine is not None:
            result["template_engine"] = self.template_engine.value
        if self.show_foreign_key_link is not None:
            result["show_foreign_key_link"] = self.show_foreign_key_link
        if self.array_ux_mode is not None:
            result["array_ux_mode"] = self.array_ux_mode.value
        if self.column_order is not None:
            if self.column_order is False:
                result["column_order"] = False
            else:
                result["column_order"] = [k.to_dict() if isinstance(k, SortKey) else k for k in self.column_order]
        if self.wait_for is not None:
            result["wait_for"] = self.wait_for
        return result

SortKey `dataclass`

A sort key for row ordering.

Parameters:

Name	Type	Description	Default
`column`	`str`	Column name to sort by	required
`descending`	`bool`	Sort in descending order (default False)	`False`

Example

SortKey("Name") # Ascending # doctest: +SKIP SortKey("Created", descending=True) # Descending # doctest: +SKIP

Source code in src/deriva_ml/model/annotations.py

@dataclass
class SortKey:
    """A sort key for row ordering.

    Args:
        column: Column name to sort by
        descending: Sort in descending order (default False)

    Example:
        >>> SortKey("Name")  # Ascending  # doctest: +SKIP
        >>> SortKey("Created", descending=True)  # Descending  # doctest: +SKIP
    """

    column: str
    descending: bool = False

    def to_dict(self) -> dict[str, Any] | str:
        """Convert to dict or string (if ascending)."""
        if self.descending:
            return {"column": self.column, "descending": True}
        return self.column

to_dict

to_dict() -> dict[str, Any] | str

Convert to dict or string (if ascending).

Source code in src/deriva_ml/model/annotations.py

def to_dict(self) -> dict[str, Any] | str:
    """Convert to dict or string (if ascending)."""
    if self.descending:
        return {"column": self.column, "descending": True}
    return self.column

TableDisplay `dataclass`

Bases: AnnotationBuilder

Table-display annotation builder.

Controls table-level display options like row naming and ordering.

Example

td = TableDisplay() # doctest: +SKIP td.row_name(row_markdown_pattern="{{{Name}}} ({{{Species}}})") # doctest: +SKIP td.compact(row_order=[SortKey("Name")]) # doctest: +SKIP

Source code in src/deriva_ml/model/annotations.py

@dataclass
class TableDisplay(AnnotationBuilder):
    """Table-display annotation builder.

    Controls table-level display options like row naming and ordering.

    Example:
        >>> td = TableDisplay()  # doctest: +SKIP
        >>> td.row_name(row_markdown_pattern="{{{Name}}} ({{{Species}}})")  # doctest: +SKIP
        >>> td.compact(row_order=[SortKey("Name")])  # doctest: +SKIP
    """

    tag = TAG_TABLE_DISPLAY

    _contexts: dict[str, TableDisplayOptions | str | None] = field(default_factory=dict)

    def set_context(self, context: str, options: TableDisplayOptions | str | None) -> "TableDisplay":
        """Set options for a context."""
        self._contexts[context] = options
        return self

    def row_name(self, row_markdown_pattern: str, template_engine: TemplateEngine | None = None) -> "TableDisplay":
        """Set row name pattern (used in foreign key dropdowns, etc.)."""
        return self.set_context(
            CONTEXT_ROW_NAME,
            TableDisplayOptions(row_markdown_pattern=row_markdown_pattern, template_engine=template_engine),
        )

    def compact(self, options: TableDisplayOptions) -> "TableDisplay":
        """Set options for compact (list) view."""
        return self.set_context(CONTEXT_COMPACT, options)

    def detailed(self, options: TableDisplayOptions) -> "TableDisplay":
        """Set options for detailed (record) view."""
        return self.set_context(CONTEXT_DETAILED, options)

    def default(self, options: TableDisplayOptions) -> "TableDisplay":
        """Set default options."""
        return self.set_context(CONTEXT_DEFAULT, options)

    def to_dict(self) -> dict[str, Any]:
        result = {}
        for context, options in self._contexts.items():
            if options is None:
                result[context] = None
            elif isinstance(options, str):
                result[context] = options
            else:
                result[context] = options.to_dict()
        return result

compact

compact(
    options: TableDisplayOptions,
) -> "TableDisplay"

Set options for compact (list) view.

Source code in src/deriva_ml/model/annotations.py

def compact(self, options: TableDisplayOptions) -> "TableDisplay":
    """Set options for compact (list) view."""
    return self.set_context(CONTEXT_COMPACT, options)

default

default(
    options: TableDisplayOptions,
) -> "TableDisplay"

Set default options.

Source code in src/deriva_ml/model/annotations.py

def default(self, options: TableDisplayOptions) -> "TableDisplay":
    """Set default options."""
    return self.set_context(CONTEXT_DEFAULT, options)

detailed

detailed(
    options: TableDisplayOptions,
) -> "TableDisplay"

Set options for detailed (record) view.

Source code in src/deriva_ml/model/annotations.py

def detailed(self, options: TableDisplayOptions) -> "TableDisplay":
    """Set options for detailed (record) view."""
    return self.set_context(CONTEXT_DETAILED, options)

row_name

row_name(
    row_markdown_pattern: str,
    template_engine: TemplateEngine
    | None = None,
) -> "TableDisplay"

Set row name pattern (used in foreign key dropdowns, etc.).

Source code in src/deriva_ml/model/annotations.py

def row_name(self, row_markdown_pattern: str, template_engine: TemplateEngine | None = None) -> "TableDisplay":
    """Set row name pattern (used in foreign key dropdowns, etc.)."""
    return self.set_context(
        CONTEXT_ROW_NAME,
        TableDisplayOptions(row_markdown_pattern=row_markdown_pattern, template_engine=template_engine),
    )

set_context

set_context(
    context: str,
    options: TableDisplayOptions
    | str
    | None,
) -> "TableDisplay"

Set options for a context.

Source code in src/deriva_ml/model/annotations.py

def set_context(self, context: str, options: TableDisplayOptions | str | None) -> "TableDisplay":
    """Set options for a context."""
    self._contexts[context] = options
    return self

TableDisplayOptions `dataclass`

Options for a single table display context.

Parameters:

Name	Type	Description	Default
`row_order`	`list[SortKey] \| None`	Sort order for rows	`None`
`page_size`	`int \| None`	Number of rows per page	`None`
`row_markdown_pattern`	`str \| None`	Template for row names	`None`
`page_markdown_pattern`	`str \| None`	Template for page header	`None`
`separator_markdown`	`str \| None`	Template between rows	`None`
`prefix_markdown`	`str \| None`	Template before rows	`None`
`suffix_markdown`	`str \| None`	Template after rows	`None`
`template_engine`	`TemplateEngine \| None`	Template engine for patterns	`None`
`collapse_toc_panel`	`bool \| None`	Collapse TOC panel	`None`
`hide_column_headers`	`bool \| None`	Hide column headers	`None`

Source code in src/deriva_ml/model/annotations.py

@dataclass
class TableDisplayOptions:
    """Options for a single table display context.

    Args:
        row_order: Sort order for rows
        page_size: Number of rows per page
        row_markdown_pattern: Template for row names
        page_markdown_pattern: Template for page header
        separator_markdown: Template between rows
        prefix_markdown: Template before rows
        suffix_markdown: Template after rows
        template_engine: Template engine for patterns
        collapse_toc_panel: Collapse TOC panel
        hide_column_headers: Hide column headers
    """

    row_order: list[SortKey] | None = None
    page_size: int | None = None
    row_markdown_pattern: str | None = None
    page_markdown_pattern: str | None = None
    separator_markdown: str | None = None
    prefix_markdown: str | None = None
    suffix_markdown: str | None = None
    template_engine: TemplateEngine | None = None
    collapse_toc_panel: bool | None = None
    hide_column_headers: bool | None = None

    def to_dict(self) -> dict[str, Any]:
        result = {}
        if self.row_order is not None:
            result["row_order"] = [k.to_dict() if isinstance(k, SortKey) else k for k in self.row_order]
        if self.page_size is not None:
            result["page_size"] = self.page_size
        if self.row_markdown_pattern is not None:
            result["row_markdown_pattern"] = self.row_markdown_pattern
        if self.page_markdown_pattern is not None:
            result["page_markdown_pattern"] = self.page_markdown_pattern
        if self.separator_markdown is not None:
            result["separator_markdown"] = self.separator_markdown
        if self.prefix_markdown is not None:
            result["prefix_markdown"] = self.prefix_markdown
        if self.suffix_markdown is not None:
            result["suffix_markdown"] = self.suffix_markdown
        if self.template_engine is not None:
            result["template_engine"] = self.template_engine.value
        if self.collapse_toc_panel is not None:
            result["collapse_toc_panel"] = self.collapse_toc_panel
        if self.hide_column_headers is not None:
            result["hide_column_headers"] = self.hide_column_headers
        return result

TemplateEngine

Bases: str, Enum

Template engine for markdown patterns.

Attributes:

Name	Type	Description
`HANDLEBARS`		Use Handlebars.js templating (recommended, more features)
`MUSTACHE`		Use Mustache templating (simpler, fewer features)

Example

display = PseudoColumnDisplay( # doctest: +SKIP ... markdown_pattern="{{{Name}}}", ... template_engine=TemplateEngine.HANDLEBARS ... )

Source code in src/deriva_ml/model/annotations.py

class TemplateEngine(str, Enum):
    """Template engine for markdown patterns.

    Attributes:
        HANDLEBARS: Use Handlebars.js templating (recommended, more features)
        MUSTACHE: Use Mustache templating (simpler, fewer features)

    Example:
        >>> display = PseudoColumnDisplay(  # doctest: +SKIP
        ...     markdown_pattern="[{{{Name}}}]({{{URL}}})",
        ...     template_engine=TemplateEngine.HANDLEBARS
        ... )
    """

    HANDLEBARS = "handlebars"
    MUSTACHE = "mustache"

VisibleColumns `dataclass`

Bases: AnnotationBuilder

Visible-columns annotation builder.

Controls which columns appear in different UI contexts and their order. This is one of the most commonly used annotations for customizing the Chaise interface.

Column entries can be: - Column names (strings): "Name", "RID", "Description" - Foreign key references: fk_constraint("schema", "constraint_name") - Pseudo-columns: PseudoColumn(...) for computed/derived values

Contexts: - compact: Table/list views (search results, data browser) - detailed: Single record view (full record page) - entry: Create/edit forms - entry/create: Create form only - entry/edit: Edit form only - *: Default for all contexts

Example

Basic column lists for different contexts, then stage and apply (same table.annotations[VisibleColumns.tag] = vc.to_dict(); ml.apply_annotations() path as every builder)::

>>> vc = VisibleColumns()  # doctest: +SKIP
>>> vc.compact(["RID", "Name", "Status"])  # doctest: +SKIP
>>> vc.detailed(["RID", "Name", "Status", "Description", "Created"])  # doctest: +SKIP
>>> vc.entry(["Name", "Status", "Description"])  # doctest: +SKIP
>>> table.annotations[VisibleColumns.tag] = vc.to_dict()  # doctest: +SKIP
>>> ml.apply_annotations()  # doctest: +SKIP

Method chaining::

>>> vc = (VisibleColumns()  # doctest: +SKIP
...     .compact(["RID", "Name"])
...     .detailed(["RID", "Name", "Description"])
...     .entry(["Name", "Description"]))

Including foreign key references::

>>> vc = VisibleColumns()  # doctest: +SKIP
>>> vc.compact([  # doctest: +SKIP
...     "RID",
...     "Name",
...     fk_constraint("domain", "Subject_Species_fkey"),
... ])

With pseudo-columns for computed values::

>>> vc = VisibleColumns()  # doctest: +SKIP
>>> vc.compact([  # doctest: +SKIP
...     "RID",
...     "Name",
...     PseudoColumn(
...         source=[InboundFK("domain", "Sample_Subject_fkey"), "RID"],
...         aggregate=Aggregate.CNT,
...         markdown_name="Samples"
...     ),
... ])

Context inheritance (reference another context)::

>>> vc = VisibleColumns()  # doctest: +SKIP
>>> vc.compact(["RID", "Name"])  # doctest: +SKIP
>>> vc.set_context("compact/brief", "compact")  # Inherit from compact  # doctest: +SKIP

With faceted search (filter context)::

>>> vc = VisibleColumns()  # doctest: +SKIP
>>> vc.compact(["RID", "Name", "Status"])  # doctest: +SKIP
>>> facets = FacetList()  # doctest: +SKIP
>>> facets.add(Facet(source="Status", open=True))  # doctest: +SKIP
>>> vc._contexts["filter"] = facets.to_dict()  # doctest: +SKIP

Source code in src/deriva_ml/model/annotations.py

@dataclass
class VisibleColumns(AnnotationBuilder):
    """Visible-columns annotation builder.

    Controls which columns appear in different UI contexts and their order.
    This is one of the most commonly used annotations for customizing the
    Chaise interface.

    Column entries can be:
    - Column names (strings): "Name", "RID", "Description"
    - Foreign key references: fk_constraint("schema", "constraint_name")
    - Pseudo-columns: PseudoColumn(...) for computed/derived values

    Contexts:
    - ``compact``: Table/list views (search results, data browser)
    - ``detailed``: Single record view (full record page)
    - ``entry``: Create/edit forms
    - ``entry/create``: Create form only
    - ``entry/edit``: Edit form only
    - ``*``: Default for all contexts

    Example:
        Basic column lists for different contexts, then stage and apply
        (same ``table.annotations[VisibleColumns.tag] = vc.to_dict();
        ml.apply_annotations()`` path as every builder)::

            >>> vc = VisibleColumns()  # doctest: +SKIP
            >>> vc.compact(["RID", "Name", "Status"])  # doctest: +SKIP
            >>> vc.detailed(["RID", "Name", "Status", "Description", "Created"])  # doctest: +SKIP
            >>> vc.entry(["Name", "Status", "Description"])  # doctest: +SKIP
            >>> table.annotations[VisibleColumns.tag] = vc.to_dict()  # doctest: +SKIP
            >>> ml.apply_annotations()  # doctest: +SKIP

        Method chaining::

            >>> vc = (VisibleColumns()  # doctest: +SKIP
            ...     .compact(["RID", "Name"])
            ...     .detailed(["RID", "Name", "Description"])
            ...     .entry(["Name", "Description"]))

        Including foreign key references::

            >>> vc = VisibleColumns()  # doctest: +SKIP
            >>> vc.compact([  # doctest: +SKIP
            ...     "RID",
            ...     "Name",
            ...     fk_constraint("domain", "Subject_Species_fkey"),
            ... ])

        With pseudo-columns for computed values::

            >>> vc = VisibleColumns()  # doctest: +SKIP
            >>> vc.compact([  # doctest: +SKIP
            ...     "RID",
            ...     "Name",
            ...     PseudoColumn(
            ...         source=[InboundFK("domain", "Sample_Subject_fkey"), "RID"],
            ...         aggregate=Aggregate.CNT,
            ...         markdown_name="Samples"
            ...     ),
            ... ])

        Context inheritance (reference another context)::

            >>> vc = VisibleColumns()  # doctest: +SKIP
            >>> vc.compact(["RID", "Name"])  # doctest: +SKIP
            >>> vc.set_context("compact/brief", "compact")  # Inherit from compact  # doctest: +SKIP

        With faceted search (filter context)::

            >>> vc = VisibleColumns()  # doctest: +SKIP
            >>> vc.compact(["RID", "Name", "Status"])  # doctest: +SKIP
            >>> facets = FacetList()  # doctest: +SKIP
            >>> facets.add(Facet(source="Status", open=True))  # doctest: +SKIP
            >>> vc._contexts["filter"] = facets.to_dict()  # doctest: +SKIP
    """

    tag = TAG_VISIBLE_COLUMNS

    _contexts: dict[str, list[ColumnEntry] | str] = field(default_factory=dict)

    def set_context(self, context: str, columns: list[ColumnEntry] | str) -> "VisibleColumns":
        """Set columns for a context.

        Args:
            context: Context name (e.g., "compact", "detailed", "*")
            columns: List of columns, or string referencing another context

        Returns:
            Self for chaining
        """
        self._contexts[context] = columns
        return self

    def compact(self, columns: list[ColumnEntry]) -> "VisibleColumns":
        """Set columns for compact (list) view."""
        return self.set_context(CONTEXT_COMPACT, columns)

    def detailed(self, columns: list[ColumnEntry]) -> "VisibleColumns":
        """Set columns for detailed (record) view."""
        return self.set_context(CONTEXT_DETAILED, columns)

    def entry(self, columns: list[ColumnEntry]) -> "VisibleColumns":
        """Set columns for entry (create/edit) forms."""
        return self.set_context(CONTEXT_ENTRY, columns)

    def entry_create(self, columns: list[ColumnEntry]) -> "VisibleColumns":
        """Set columns for create form only."""
        return self.set_context(CONTEXT_ENTRY_CREATE, columns)

    def entry_edit(self, columns: list[ColumnEntry]) -> "VisibleColumns":
        """Set columns for edit form only."""
        return self.set_context(CONTEXT_ENTRY_EDIT, columns)

    def default(self, columns: list[ColumnEntry]) -> "VisibleColumns":
        """Set default columns for all contexts."""
        return self.set_context(CONTEXT_DEFAULT, columns)

    def to_dict(self) -> dict[str, Any]:
        result = {}
        for context, columns in self._contexts.items():
            if isinstance(columns, str):
                result[context] = columns
            else:
                result[context] = [c.to_dict() if isinstance(c, PseudoColumn) else c for c in columns]
        return result

compact

compact(
    columns: list[ColumnEntry],
) -> "VisibleColumns"

Set columns for compact (list) view.

Source code in src/deriva_ml/model/annotations.py

def compact(self, columns: list[ColumnEntry]) -> "VisibleColumns":
    """Set columns for compact (list) view."""
    return self.set_context(CONTEXT_COMPACT, columns)

default

default(
    columns: list[ColumnEntry],
) -> "VisibleColumns"

Set default columns for all contexts.

Source code in src/deriva_ml/model/annotations.py

def default(self, columns: list[ColumnEntry]) -> "VisibleColumns":
    """Set default columns for all contexts."""
    return self.set_context(CONTEXT_DEFAULT, columns)

detailed

detailed(
    columns: list[ColumnEntry],
) -> "VisibleColumns"

Set columns for detailed (record) view.

Source code in src/deriva_ml/model/annotations.py

def detailed(self, columns: list[ColumnEntry]) -> "VisibleColumns":
    """Set columns for detailed (record) view."""
    return self.set_context(CONTEXT_DETAILED, columns)

entry

entry(
    columns: list[ColumnEntry],
) -> "VisibleColumns"

Set columns for entry (create/edit) forms.

Source code in src/deriva_ml/model/annotations.py

def entry(self, columns: list[ColumnEntry]) -> "VisibleColumns":
    """Set columns for entry (create/edit) forms."""
    return self.set_context(CONTEXT_ENTRY, columns)

entry_create

entry_create(
    columns: list[ColumnEntry],
) -> "VisibleColumns"

Set columns for create form only.

Source code in src/deriva_ml/model/annotations.py

def entry_create(self, columns: list[ColumnEntry]) -> "VisibleColumns":
    """Set columns for create form only."""
    return self.set_context(CONTEXT_ENTRY_CREATE, columns)

entry_edit

entry_edit(
    columns: list[ColumnEntry],
) -> "VisibleColumns"

Set columns for edit form only.

Source code in src/deriva_ml/model/annotations.py

def entry_edit(self, columns: list[ColumnEntry]) -> "VisibleColumns":
    """Set columns for edit form only."""
    return self.set_context(CONTEXT_ENTRY_EDIT, columns)

set_context

set_context(
    context: str,
    columns: list[ColumnEntry] | str,
) -> "VisibleColumns"

Set columns for a context.

Parameters:

Name	Type	Description	Default
`context`	`str`	Context name (e.g., "compact", "detailed", "*")	required
`columns`	`list[ColumnEntry] \| str`	List of columns, or string referencing another context	required

Returns:

Type	Description
`'VisibleColumns'`	Self for chaining

Source code in src/deriva_ml/model/annotations.py

def set_context(self, context: str, columns: list[ColumnEntry] | str) -> "VisibleColumns":
    """Set columns for a context.

    Args:
        context: Context name (e.g., "compact", "detailed", "*")
        columns: List of columns, or string referencing another context

    Returns:
        Self for chaining
    """
    self._contexts[context] = columns
    return self

VisibleForeignKeys `dataclass`

Bases: AnnotationBuilder

Visible-foreign-keys annotation builder.

Controls which related tables appear in the UI via inbound foreign keys.

Example

vfk = VisibleForeignKeys() # doctest: +SKIP vfk.detailed([ # doctest: +SKIP ... fk_constraint("domain", "Image_Subject_fkey"), ... fk_constraint("domain", "Diagnosis_Subject_fkey") ... ])

Source code in src/deriva_ml/model/annotations.py

@dataclass
class VisibleForeignKeys(AnnotationBuilder):
    """Visible-foreign-keys annotation builder.

    Controls which related tables appear in the UI via inbound foreign keys.

    Example:
        >>> vfk = VisibleForeignKeys()  # doctest: +SKIP
        >>> vfk.detailed([  # doctest: +SKIP
        ...     fk_constraint("domain", "Image_Subject_fkey"),
        ...     fk_constraint("domain", "Diagnosis_Subject_fkey")
        ... ])
    """

    tag = TAG_VISIBLE_FOREIGN_KEYS

    _contexts: dict[str, list[ForeignKeyEntry] | str] = field(default_factory=dict)

    def set_context(self, context: str, foreign_keys: list[ForeignKeyEntry] | str) -> "VisibleForeignKeys":
        """Set foreign keys for a context."""
        self._contexts[context] = foreign_keys
        return self

    def detailed(self, foreign_keys: list[ForeignKeyEntry]) -> "VisibleForeignKeys":
        """Set foreign keys for detailed view."""
        return self.set_context(CONTEXT_DETAILED, foreign_keys)

    def default(self, foreign_keys: list[ForeignKeyEntry]) -> "VisibleForeignKeys":
        """Set default foreign keys for all contexts."""
        return self.set_context(CONTEXT_DEFAULT, foreign_keys)

    def to_dict(self) -> dict[str, Any]:
        result = {}
        for context, fkeys in self._contexts.items():
            if isinstance(fkeys, str):
                result[context] = fkeys
            else:
                result[context] = [fk.to_dict() if isinstance(fk, PseudoColumn) else fk for fk in fkeys]
        return result

default

default(
    foreign_keys: list[ForeignKeyEntry],
) -> "VisibleForeignKeys"

Set default foreign keys for all contexts.

Source code in src/deriva_ml/model/annotations.py

def default(self, foreign_keys: list[ForeignKeyEntry]) -> "VisibleForeignKeys":
    """Set default foreign keys for all contexts."""
    return self.set_context(CONTEXT_DEFAULT, foreign_keys)

detailed

detailed(
    foreign_keys: list[ForeignKeyEntry],
) -> "VisibleForeignKeys"

Set foreign keys for detailed view.

Source code in src/deriva_ml/model/annotations.py

def detailed(self, foreign_keys: list[ForeignKeyEntry]) -> "VisibleForeignKeys":
    """Set foreign keys for detailed view."""
    return self.set_context(CONTEXT_DETAILED, foreign_keys)

set_context

set_context(
    context: str,
    foreign_keys: list[ForeignKeyEntry]
    | str,
) -> "VisibleForeignKeys"

Set foreign keys for a context.

Source code in src/deriva_ml/model/annotations.py

def set_context(self, context: str, foreign_keys: list[ForeignKeyEntry] | str) -> "VisibleForeignKeys":
    """Set foreign keys for a context."""
    self._contexts[context] = foreign_keys
    return self

getattr

__getattr__(name: str)

Lazy import for DatabaseModel and DerivaMLBagView.

Source code in src/deriva_ml/model/__init__.py

def __getattr__(name: str):
    """Lazy import for DatabaseModel and DerivaMLBagView."""
    if name == "DatabaseModel":
        from deriva_ml.model.database import DatabaseModel

        return DatabaseModel
    if name == "DerivaMLBagView":
        from deriva_ml.model.deriva_ml_bag_view import DerivaMLBagView

        return DerivaMLBagView
    raise AttributeError(f"module {__name__!r} has no attribute {name!r}")

fk_constraint

fk_constraint(
    schema: str, constraint: str
) -> list[str]

Create a foreign key constraint reference for visible-columns.

Use this in visible-columns to include a foreign key column (showing the referenced row's name/link). This is different from InboundFK/OutboundFK which are used inside PseudoColumn source paths.

Parameters:

Name	Type	Description	Default
`schema`	`str`	Schema name containing the FK constraint	required
`constraint`	`str`	Foreign key constraint name	required

Returns:

Type	Description
`list[str]`	[schema, constraint] list for use in visible-columns

Example

Include a foreign key in visible columns::

>>> vc = VisibleColumns()  # doctest: +SKIP
>>> vc.compact([  # doctest: +SKIP
...     "RID",
...     "Name",
...     fk_constraint("domain", "Subject_Species_fkey"),  # Shows Species
... ])

This is equivalent to the raw format::

>>> vc.compact(["RID", "Name", ["domain", "Subject_Species_fkey"]])  # doctest: +SKIP

Source code in src/deriva_ml/model/annotations.py

def fk_constraint(schema: str, constraint: str) -> list[str]:
    """Create a foreign key constraint reference for visible-columns.

    Use this in visible-columns to include a foreign key column (showing the
    referenced row's name/link). This is different from InboundFK/OutboundFK
    which are used inside PseudoColumn source paths.

    Args:
        schema: Schema name containing the FK constraint
        constraint: Foreign key constraint name

    Returns:
        [schema, constraint] list for use in visible-columns

    Example:
        Include a foreign key in visible columns::

            >>> vc = VisibleColumns()  # doctest: +SKIP
            >>> vc.compact([  # doctest: +SKIP
            ...     "RID",
            ...     "Name",
            ...     fk_constraint("domain", "Subject_Species_fkey"),  # Shows Species
            ... ])

        This is equivalent to the raw format::

            >>> vc.compact(["RID", "Name", ["domain", "Subject_Species_fkey"]])  # doctest: +SKIP
    """
    return [schema, constraint]

DerivaModel

Aggregate

Count related records

Get distinct values as array

ArrayUxMode

ColumnDisplay dataclass

Markdown link

compact

default

detailed

set_context

ColumnDisplayOptions dataclass

DerivaModel

chaise_config property

__getattr__

__init__

apply

asset_metadata

asset_metadata_columns

asset_metadata_sorted

create_table

find_asset_execution_tables

find_assets

find_association

find_features

find_vocabularies

from_cached classmethod

get_schema_description

is_asset

is_association

is_dataset_rid

is_domain_schema

is_system_schema

is_vocabulary

list_dataset_element_types

lookup_feature

name_to_table

refresh_model

vocab_columns

Display dataclass

Facet dataclass

FacetList dataclass

add

FacetRange dataclass

FacetUxMode

Choice-based facet

Range-based facet for numeric values

Check presence (has value / no value)

InboundFK dataclass

NameStyle dataclass

Transform "Subject_ID" to "Subject Id" with title case

to_dict

OutboundFK dataclass

PreFormat dataclass

PseudoColumn dataclass

PseudoColumnDisplay dataclass

SortKey dataclass

to_dict

TableDisplay dataclass

compact

default

detailed

row_name

set_context

TableDisplayOptions dataclass

TemplateEngine

VisibleColumns dataclass

compact

default

detailed

entry

entry_create

entry_edit

set_context

VisibleForeignKeys dataclass

default

detailed

set_context

__getattr__

fk_constraint

ColumnDisplay `dataclass`

ColumnDisplayOptions `dataclass`

chaise_config `property`

getattr

init

from_cached `classmethod`

Display `dataclass`

Facet `dataclass`

FacetList `dataclass`

FacetRange `dataclass`

InboundFK `dataclass`

NameStyle `dataclass`

OutboundFK `dataclass`

PreFormat `dataclass`

PseudoColumn `dataclass`

PseudoColumnDisplay `dataclass`

SortKey `dataclass`

TableDisplay `dataclass`

TableDisplayOptions `dataclass`

VisibleColumns `dataclass`

VisibleForeignKeys `dataclass`

getattr