Skip to content

Definitions & Types

Core type definitions, enums, and constants used throughout DerivaML. This includes vocabulary types, status enums, column definitions, and other foundational types.

Shared definitions for DerivaML modules.

This module serves as the central location for type definitions, constants, enums, and data models used throughout DerivaML. It re-exports symbols from specialized submodules for convenience and backwards compatibility.

The module consolidates
  • Constants: Schema names, RID patterns, column definitions
  • Enums: Status codes, upload states, built-in types, vocabulary identifiers
  • Models: Dataclass-based models for ERMrest structures (tables, columns, keys)
  • Utilities: FileSpec for file metadata handling

Core definition classes (ColumnDef, KeyDef, ForeignKeyDef, TableDef) are provided by deriva.core.typed and re-exported here. Legacy aliases (ColumnDefinition, etc.) are maintained for backwards compatibility.

For more specialized imports, you can import directly from submodules: >>> from deriva_ml.core.constants import ML_SCHEMA >>> from deriva_ml.core.enums import Status >>> from deriva.core.typed import ColumnDef

BuiltinTypes module-attribute

BuiltinTypes = BuiltinType

Alias for BuiltinType from deriva.core.typed.

This maintains backwards compatibility with existing DerivaML code that uses the plural form 'BuiltinTypes'. New code should use BuiltinType directly.

ColumnDefinition module-attribute

ColumnDefinition = ColumnDef

Alias for ColumnDef from deriva.core.typed.

This maintains backwards compatibility with existing DerivaML code. New code should use ColumnDef directly.

ForeignKeyDefinition module-attribute

ForeignKeyDefinition = ForeignKeyDef

Alias for ForeignKeyDef from deriva.core.typed.

This maintains backwards compatibility with existing DerivaML code. New code should use ForeignKeyDef directly.

KeyDefinition module-attribute

KeyDefinition = KeyDef

Alias for KeyDef from deriva.core.typed.

This maintains backwards compatibility with existing DerivaML code. New code should use KeyDef directly.

TableDefinition module-attribute

TableDefinition = TableDef

Alias for TableDef from deriva.core.typed.

This maintains backwards compatibility with existing DerivaML code. New code should use TableDef directly.

BaseStrEnum

Bases: str, Enum

Base class for string-based enumerations.

Extends both str and Enum to create string enums that are both string-like and enumerated. This provides type safety while maintaining string compatibility.

Example

class MyEnum(BaseStrEnum): ... VALUE = "value" isinstance(MyEnum.VALUE, str) # True isinstance(MyEnum.VALUE, Enum) # True

Source code in src/deriva_ml/core/enums.py
31
32
33
34
35
36
37
38
39
40
41
42
43
44
class BaseStrEnum(str, Enum):
    """Base class for string-based enumerations.

    Extends both str and Enum to create string enums that are both string-like and enumerated.
    This provides type safety while maintaining string compatibility.

    Example:
        >>> class MyEnum(BaseStrEnum):
        ...     VALUE = "value"
        >>> isinstance(MyEnum.VALUE, str)  # True
        >>> isinstance(MyEnum.VALUE, Enum)  # True
    """

    pass

DerivaMLAuthenticationError

Bases: DerivaMLConfigurationError

Exception raised for authentication failures.

Raised when authentication with the catalog fails or credentials are invalid.

Example

raise DerivaMLAuthenticationError("Failed to authenticate with catalog")

Source code in src/deriva_ml/core/exceptions.py
 96
 97
 98
 99
100
101
102
103
104
105
class DerivaMLAuthenticationError(DerivaMLConfigurationError):
    """Exception raised for authentication failures.

    Raised when authentication with the catalog fails or credentials are invalid.

    Example:
        >>> raise DerivaMLAuthenticationError("Failed to authenticate with catalog")
    """

    pass

DerivaMLConfigurationError

Bases: DerivaMLException

Exception raised for configuration and initialization errors.

Raised when there are issues with DerivaML configuration, catalog initialization, or schema setup.

Example

raise DerivaMLConfigurationError("Invalid catalog configuration")

Source code in src/deriva_ml/core/exceptions.py
70
71
72
73
74
75
76
77
78
79
80
class DerivaMLConfigurationError(DerivaMLException):
    """Exception raised for configuration and initialization errors.

    Raised when there are issues with DerivaML configuration, catalog
    initialization, or schema setup.

    Example:
        >>> raise DerivaMLConfigurationError("Invalid catalog configuration")
    """

    pass

DerivaMLCycleError

Bases: DerivaMLDataError

Exception raised when a cycle is detected in relationships.

Raised when creating dataset hierarchies or other relationships that would result in a circular dependency.

Parameters:

Name Type Description Default
cycle_nodes list[str]

List of nodes involved in the cycle.

required
msg str

Additional context. Defaults to "Cycle detected".

'Cycle detected'
Example

raise DerivaMLCycleError(["Dataset1", "Dataset2", "Dataset1"])

Source code in src/deriva_ml/core/exceptions.py
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
class DerivaMLCycleError(DerivaMLDataError):
    """Exception raised when a cycle is detected in relationships.

    Raised when creating dataset hierarchies or other relationships that
    would result in a circular dependency.

    Args:
        cycle_nodes: List of nodes involved in the cycle.
        msg: Additional context. Defaults to "Cycle detected".

    Example:
        >>> raise DerivaMLCycleError(["Dataset1", "Dataset2", "Dataset1"])
    """

    def __init__(self, cycle_nodes: list[str], msg: str = "Cycle detected") -> None:
        super().__init__(f"{msg}: {cycle_nodes}")
        self.cycle_nodes = cycle_nodes

DerivaMLDataError

Bases: DerivaMLException

Exception raised for data access and validation issues.

Base class for errors related to data lookup, validation, and integrity.

Example

raise DerivaMLDataError("Invalid data format")

Source code in src/deriva_ml/core/exceptions.py
113
114
115
116
117
118
119
120
121
122
class DerivaMLDataError(DerivaMLException):
    """Exception raised for data access and validation issues.

    Base class for errors related to data lookup, validation, and integrity.

    Example:
        >>> raise DerivaMLDataError("Invalid data format")
    """

    pass

DerivaMLDatasetNotFound

Bases: DerivaMLNotFoundError

Exception raised when a dataset cannot be found.

Raised when attempting to look up a dataset that doesn't exist in the catalog or downloaded bag.

Parameters:

Name Type Description Default
dataset_rid str

The RID of the dataset that was not found.

required
msg str

Additional context. Defaults to "Dataset not found".

'Dataset not found'
Example

raise DerivaMLDatasetNotFound("1-ABC") DerivaMLDatasetNotFound: Dataset 1-ABC not found

Source code in src/deriva_ml/core/exceptions.py
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
class DerivaMLDatasetNotFound(DerivaMLNotFoundError):
    """Exception raised when a dataset cannot be found.

    Raised when attempting to look up a dataset that doesn't exist in the
    catalog or downloaded bag.

    Args:
        dataset_rid: The RID of the dataset that was not found.
        msg: Additional context. Defaults to "Dataset not found".

    Example:
        >>> raise DerivaMLDatasetNotFound("1-ABC")
        DerivaMLDatasetNotFound: Dataset 1-ABC not found
    """

    def __init__(self, dataset_rid: str, msg: str = "Dataset not found") -> None:
        super().__init__(f"{msg}: {dataset_rid}")
        self.dataset_rid = dataset_rid

DerivaMLException

Bases: Exception

Base exception class for all DerivaML errors.

This is the root exception for all DerivaML-specific errors. Catching this exception will catch any error raised by the DerivaML library.

Attributes:

Name Type Description
_msg

The error message stored for later access.

Parameters:

Name Type Description Default
msg str

Descriptive error message. Defaults to empty string.

''
Example

raise DerivaMLException("Failed to connect to catalog") DerivaMLException: Failed to connect to catalog

Source code in src/deriva_ml/core/exceptions.py
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
class DerivaMLException(Exception):
    """Base exception class for all DerivaML errors.

    This is the root exception for all DerivaML-specific errors. Catching this
    exception will catch any error raised by the DerivaML library.

    Attributes:
        _msg: The error message stored for later access.

    Args:
        msg: Descriptive error message. Defaults to empty string.

    Example:
        >>> raise DerivaMLException("Failed to connect to catalog")
        DerivaMLException: Failed to connect to catalog
    """

    def __init__(self, msg: str = "") -> None:
        super().__init__(msg)
        self._msg = msg

DerivaMLExecutionError

Bases: DerivaMLException

Exception raised for execution lifecycle issues.

Base class for errors related to workflow execution, asset management, and provenance tracking.

Example

raise DerivaMLExecutionError("Execution failed to initialize")

Source code in src/deriva_ml/core/exceptions.py
258
259
260
261
262
263
264
265
266
267
268
class DerivaMLExecutionError(DerivaMLException):
    """Exception raised for execution lifecycle issues.

    Base class for errors related to workflow execution, asset management,
    and provenance tracking.

    Example:
        >>> raise DerivaMLExecutionError("Execution failed to initialize")
    """

    pass

DerivaMLInvalidTerm

Bases: DerivaMLNotFoundError

Exception raised when a vocabulary term is not found or invalid.

Raised when attempting to look up or use a term that doesn't exist in a controlled vocabulary table, or when a term name/synonym cannot be resolved.

Parameters:

Name Type Description Default
vocabulary str

Name of the vocabulary table being searched.

required
term str

The term name that was not found.

required
msg str

Additional context about the error. Defaults to "Term doesn't exist".

"Term doesn't exist"
Example

raise DerivaMLInvalidTerm("Diagnosis", "unknown_condition") DerivaMLInvalidTerm: Invalid term unknown_condition in vocabulary Diagnosis: Term doesn't exist.

Source code in src/deriva_ml/core/exceptions.py
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
class DerivaMLInvalidTerm(DerivaMLNotFoundError):
    """Exception raised when a vocabulary term is not found or invalid.

    Raised when attempting to look up or use a term that doesn't exist in
    a controlled vocabulary table, or when a term name/synonym cannot be resolved.

    Args:
        vocabulary: Name of the vocabulary table being searched.
        term: The term name that was not found.
        msg: Additional context about the error. Defaults to "Term doesn't exist".

    Example:
        >>> raise DerivaMLInvalidTerm("Diagnosis", "unknown_condition")
        DerivaMLInvalidTerm: Invalid term unknown_condition in vocabulary Diagnosis: Term doesn't exist.
    """

    def __init__(self, vocabulary: str, term: str, msg: str = "Term doesn't exist") -> None:
        super().__init__(f"Invalid term {term} in vocabulary {vocabulary}: {msg}.")
        self.vocabulary = vocabulary
        self.term = term

DerivaMLNotFoundError

Bases: DerivaMLDataError

Exception raised when an entity cannot be found.

Raised when a lookup operation fails to find the requested entity (dataset, table, term, etc.) in the catalog or bag.

Example

raise DerivaMLNotFoundError("Entity '1-ABC' not found in catalog")

Source code in src/deriva_ml/core/exceptions.py
125
126
127
128
129
130
131
132
133
134
135
class DerivaMLNotFoundError(DerivaMLDataError):
    """Exception raised when an entity cannot be found.

    Raised when a lookup operation fails to find the requested entity
    (dataset, table, term, etc.) in the catalog or bag.

    Example:
        >>> raise DerivaMLNotFoundError("Entity '1-ABC' not found in catalog")
    """

    pass

DerivaMLReadOnlyError

Bases: DerivaMLException

Exception raised when attempting write operations on read-only resources.

Raised when attempting to modify data in a downloaded bag or other read-only context where write operations are not supported.

Example

raise DerivaMLReadOnlyError("Cannot create datasets in a downloaded bag")

Source code in src/deriva_ml/core/exceptions.py
328
329
330
331
332
333
334
335
336
337
338
class DerivaMLReadOnlyError(DerivaMLException):
    """Exception raised when attempting write operations on read-only resources.

    Raised when attempting to modify data in a downloaded bag or other
    read-only context where write operations are not supported.

    Example:
        >>> raise DerivaMLReadOnlyError("Cannot create datasets in a downloaded bag")
    """

    pass

DerivaMLSchemaError

Bases: DerivaMLConfigurationError

Exception raised for schema or catalog structure issues.

Raised when the catalog schema is invalid, missing required tables, or has structural problems that prevent normal operation.

Example

raise DerivaMLSchemaError("Ambiguous domain schema: ['Schema1', 'Schema2']")

Source code in src/deriva_ml/core/exceptions.py
83
84
85
86
87
88
89
90
91
92
93
class DerivaMLSchemaError(DerivaMLConfigurationError):
    """Exception raised for schema or catalog structure issues.

    Raised when the catalog schema is invalid, missing required tables,
    or has structural problems that prevent normal operation.

    Example:
        >>> raise DerivaMLSchemaError("Ambiguous domain schema: ['Schema1', 'Schema2']")
    """

    pass

DerivaMLTableNotFound

Bases: DerivaMLNotFoundError

Exception raised when a table cannot be found.

Raised when attempting to access a table that doesn't exist in the catalog schema or downloaded bag.

Parameters:

Name Type Description Default
table_name str

The name of the table that was not found.

required
msg str

Additional context. Defaults to "Table not found".

'Table not found'
Example

raise DerivaMLTableNotFound("MyTable") DerivaMLTableNotFound: Table not found: MyTable

Source code in src/deriva_ml/core/exceptions.py
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
class DerivaMLTableNotFound(DerivaMLNotFoundError):
    """Exception raised when a table cannot be found.

    Raised when attempting to access a table that doesn't exist in the
    catalog schema or downloaded bag.

    Args:
        table_name: The name of the table that was not found.
        msg: Additional context. Defaults to "Table not found".

    Example:
        >>> raise DerivaMLTableNotFound("MyTable")
        DerivaMLTableNotFound: Table not found: MyTable
    """

    def __init__(self, table_name: str, msg: str = "Table not found") -> None:
        super().__init__(f"{msg}: {table_name}")
        self.table_name = table_name

DerivaMLTableTypeError

Bases: DerivaMLDataError

Exception raised when a RID or table is not of the expected type.

Raised when an operation requires a specific table type (e.g., Dataset, Execution) but receives a RID or table reference of a different type.

Parameters:

Name Type Description Default
table_type str

The expected table type (e.g., "Dataset", "Execution").

required
table str

The actual table name or RID that was provided.

required
Example

raise DerivaMLTableTypeError("Dataset", "1-ABC123") DerivaMLTableTypeError: Table 1-ABC123 is not of type Dataset.

Source code in src/deriva_ml/core/exceptions.py
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
class DerivaMLTableTypeError(DerivaMLDataError):
    """Exception raised when a RID or table is not of the expected type.

    Raised when an operation requires a specific table type (e.g., Dataset,
    Execution) but receives a RID or table reference of a different type.

    Args:
        table_type: The expected table type (e.g., "Dataset", "Execution").
        table: The actual table name or RID that was provided.

    Example:
        >>> raise DerivaMLTableTypeError("Dataset", "1-ABC123")
        DerivaMLTableTypeError: Table 1-ABC123 is not of type Dataset.
    """

    def __init__(self, table_type: str, table: str) -> None:
        super().__init__(f"Table {table} is not of type {table_type}.")
        self.table_type = table_type
        self.table = table

DerivaMLUploadError

Bases: DerivaMLExecutionError

Exception raised for asset upload failures.

Raised when uploading assets to the catalog fails, including file uploads, metadata insertion, and provenance recording.

Example

raise DerivaMLUploadError("Failed to upload execution assets")

Source code in src/deriva_ml/core/exceptions.py
310
311
312
313
314
315
316
317
318
319
320
class DerivaMLUploadError(DerivaMLExecutionError):
    """Exception raised for asset upload failures.

    Raised when uploading assets to the catalog fails, including file
    uploads, metadata insertion, and provenance recording.

    Example:
        >>> raise DerivaMLUploadError("Failed to upload execution assets")
    """

    pass

DerivaMLValidationError

Bases: DerivaMLDataError

Exception raised when data validation fails.

Raised when input data fails validation, such as invalid RID format, mismatched metadata, or constraint violations.

Example

raise DerivaMLValidationError("Invalid RID format: ABC")

Source code in src/deriva_ml/core/exceptions.py
221
222
223
224
225
226
227
228
229
230
231
class DerivaMLValidationError(DerivaMLDataError):
    """Exception raised when data validation fails.

    Raised when input data fails validation, such as invalid RID format,
    mismatched metadata, or constraint violations.

    Example:
        >>> raise DerivaMLValidationError("Invalid RID format: ABC")
    """

    pass

DerivaMLWorkflowError

Bases: DerivaMLExecutionError

Exception raised for workflow-related issues.

Raised when there are problems with workflow lookup, creation, or Git integration for workflow tracking.

Example

raise DerivaMLWorkflowError("Not executing in a Git repository")

Source code in src/deriva_ml/core/exceptions.py
271
272
273
274
275
276
277
278
279
280
281
class DerivaMLWorkflowError(DerivaMLExecutionError):
    """Exception raised for workflow-related issues.

    Raised when there are problems with workflow lookup, creation, or
    Git integration for workflow tracking.

    Example:
        >>> raise DerivaMLWorkflowError("Not executing in a Git repository")
    """

    pass

ExecAssetType

Bases: BaseStrEnum

Execution asset type identifiers.

Defines the types of assets that can be produced or consumed during an execution. These types are used to categorize files associated with workflow runs.

Attributes:

Name Type Description
input_file str

Input file consumed by the execution.

output_file str

Output file produced by the execution.

notebook_output str

Jupyter notebook output from the execution.

model_file str

Machine learning model file (e.g., .pkl, .h5, .pt).

Source code in src/deriva_ml/core/enums.py
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
class ExecAssetType(BaseStrEnum):
    """Execution asset type identifiers.

    Defines the types of assets that can be produced or consumed during an execution.
    These types are used to categorize files associated with workflow runs.

    Attributes:
        input_file (str): Input file consumed by the execution.
        output_file (str): Output file produced by the execution.
        notebook_output (str): Jupyter notebook output from the execution.
        model_file (str): Machine learning model file (e.g., .pkl, .h5, .pt).
    """

    input_file = "Input_File"
    output_file = "Output_File"
    notebook_output = "Notebook_Output"
    model_file = "Model_File"

ExecMetadataType

Bases: BaseStrEnum

Execution metadata type identifiers.

Defines the types of metadata that can be associated with an execution.

Attributes:

Name Type Description
execution_config str

General execution configuration data.

runtime_env str

Runtime environment information.

hydra_config str

Hydra YAML configuration files (config.yaml, overrides.yaml).

deriva_config str

DerivaML execution configuration (configuration.json).

Source code in src/deriva_ml/core/enums.py
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
class ExecMetadataType(BaseStrEnum):
    """Execution metadata type identifiers.

    Defines the types of metadata that can be associated with an execution.

    Attributes:
        execution_config (str): General execution configuration data.
        runtime_env (str): Runtime environment information.
        hydra_config (str): Hydra YAML configuration files (config.yaml, overrides.yaml).
        deriva_config (str): DerivaML execution configuration (configuration.json).
    """

    execution_config = "Execution_Config"
    runtime_env = "Runtime_Env"
    hydra_config = "Hydra_Config"
    deriva_config = "Deriva_Config"

FileSpec

Bases: BaseModel

Specification for a file to be added to the Deriva catalog.

Represents file metadata required for creating entries in the File table. Handles URL normalization, ensuring local file paths are converted to tag URIs that uniquely identify the file's origin.

Attributes:

Name Type Description
url str

File location as URL or local path. Local paths are converted to tag URIs.

md5 str

MD5 checksum for integrity verification.

length int

File size in bytes.

description str | None

Optional description of the file's contents or purpose.

file_types list[str] | None

List of file type classifications from the Asset_Type vocabulary.

Note

The 'File' type is automatically added to file_types if not present when using create_filespecs().

Example

spec = FileSpec( ... url="/data/results.csv", ... md5="d41d8cd98f00b204e9800998ecf8427e", ... length=1024, ... description="Analysis results", ... file_types=["CSV", "Data"] ... )

Source code in src/deriva_ml/core/filespec.py
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
class FileSpec(BaseModel):
    """Specification for a file to be added to the Deriva catalog.

    Represents file metadata required for creating entries in the File table.
    Handles URL normalization, ensuring local file paths are converted to
    tag URIs that uniquely identify the file's origin.

    Attributes:
        url: File location as URL or local path. Local paths are converted to tag URIs.
        md5: MD5 checksum for integrity verification.
        length: File size in bytes.
        description: Optional description of the file's contents or purpose.
        file_types: List of file type classifications from the Asset_Type vocabulary.

    Note:
        The 'File' type is automatically added to file_types if not present when
        using create_filespecs().

    Example:
        >>> spec = FileSpec(
        ...     url="/data/results.csv",
        ...     md5="d41d8cd98f00b204e9800998ecf8427e",
        ...     length=1024,
        ...     description="Analysis results",
        ...     file_types=["CSV", "Data"]
        ... )
    """

    model_config = {"populate_by_name": True}

    url: str = Field(alias="URL")
    md5: str = Field(alias="MD5")
    length: int = Field(alias="Length")
    description: str | None = Field(default="", alias="Description")
    file_types: list[str] | None = Field(default_factory=list)

    @field_validator("url")
    @classmethod
    def validate_file_url(cls, url: str) -> str:
        """Examine the provided URL. If it's a local path, convert it into a tag URL.

        Args:
            url: The URL to validate and potentially convert

        Returns:
            The validated/converted URL

        Raises:
            ValidationError: If the URL is not a file URL
        """
        url_parts = urlparse(url)
        if url_parts.scheme == "tag":
            # Already a tag URL, so just return it.
            return url
        elif (not url_parts.scheme) or url_parts.scheme == "file":
            # There is no scheme part of the URL, or it is a file URL, so it is a local file path.
            # Convert to a tag URL.
            return f"tag://{gethostname()},{date.today()}:file://{url_parts.path}"
        else:
            raise ValueError("url is not a file URL")

    @classmethod
    def create_filespecs(
        cls, path: Path | str, description: str, file_types: list[str] | Callable[[Path], list[str]] | None = None
    ) -> Generator[FileSpec, None, None]:
        """Generate FileSpec objects for a file or directory.

        Creates FileSpec objects with computed MD5 checksums for each file found.
        For directories, recursively processes all files. The 'File' type is
        automatically prepended to file_types if not already present.

        Args:
            path: Path to a file or directory. If directory, all files are processed recursively.
            description: Description to apply to all generated FileSpecs.
            file_types: Either a static list of file types, or a callable that takes a Path
                and returns a list of types for that specific file. Allows dynamic type
                assignment based on file extension, content, etc.

        Yields:
            FileSpec: A specification for each file with computed checksums and metadata.

        Example:
            Static file types:
                >>> specs = FileSpec.create_filespecs("/data/images", "Images", ["Image"])

            Dynamic file types based on extension:
                >>> def get_types(path):
                ...     ext = path.suffix.lower()
                ...     return {"png": ["PNG", "Image"], ".jpg": ["JPEG", "Image"]}.get(ext, [])
                >>> specs = FileSpec.create_filespecs("/data", "Mixed files", get_types)
        """
        path = Path(path)
        file_types = file_types or []
        # Convert static list to callable for uniform handling
        file_types_fn = file_types if callable(file_types) else lambda _x: file_types

        def create_spec(file_path: Path) -> FileSpec:
            """Create a FileSpec for a single file with computed hashes."""
            hashes = hash_utils.compute_file_hashes(file_path, hashes=frozenset(["md5", "sha256"]))
            md5 = hashes["md5"][0]
            type_list = file_types_fn(file_path)
            return FileSpec(
                length=path.stat().st_size,
                md5=md5,
                description=description,
                url=file_path.as_posix(),
                # Ensure 'File' type is always included
                file_types=type_list if "File" in type_list else ["File"] + type_list,
            )

        # Handle both single files and directories (recursive)
        files = [path] if path.is_file() else [f for f in Path(path).rglob("*") if f.is_file()]
        return (create_spec(file) for file in files)

    @staticmethod
    def read_filespec(path: Path | str) -> Generator[FileSpec, None, None]:
        """Read FileSpec objects from a JSON Lines file.

        Parses a JSONL file where each line is a JSON object representing a FileSpec.
        Empty lines are skipped. This is useful for batch processing pre-computed
        file specifications.

        Args:
            path: Path to the .jsonl file containing FileSpec data.

        Yields:
            FileSpec: Parsed FileSpec object for each valid line.

        Example:
            >>> for spec in FileSpec.read_filespec("files.jsonl"):
            ...     print(f"{spec.url}: {spec.md5}")
        """
        path = Path(path)
        with path.open("r", encoding="utf-8") as f:
            for line in f:
                line = line.strip()
                if not line:
                    continue
                yield FileSpec(**json.loads(line))

create_filespecs classmethod

create_filespecs(
    path: Path | str,
    description: str,
    file_types: list[str]
    | Callable[[Path], list[str]]
    | None = None,
) -> Generator[FileSpec, None, None]

Generate FileSpec objects for a file or directory.

Creates FileSpec objects with computed MD5 checksums for each file found. For directories, recursively processes all files. The 'File' type is automatically prepended to file_types if not already present.

Parameters:

Name Type Description Default
path Path | str

Path to a file or directory. If directory, all files are processed recursively.

required
description str

Description to apply to all generated FileSpecs.

required
file_types list[str] | Callable[[Path], list[str]] | None

Either a static list of file types, or a callable that takes a Path and returns a list of types for that specific file. Allows dynamic type assignment based on file extension, content, etc.

None

Yields:

Name Type Description
FileSpec FileSpec

A specification for each file with computed checksums and metadata.

Example

Static file types: >>> specs = FileSpec.create_filespecs("/data/images", "Images", ["Image"])

Dynamic file types based on extension: >>> def get_types(path): ... ext = path.suffix.lower() ... return {"png": ["PNG", "Image"], ".jpg": ["JPEG", "Image"]}.get(ext, []) >>> specs = FileSpec.create_filespecs("/data", "Mixed files", get_types)

Source code in src/deriva_ml/core/filespec.py
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
@classmethod
def create_filespecs(
    cls, path: Path | str, description: str, file_types: list[str] | Callable[[Path], list[str]] | None = None
) -> Generator[FileSpec, None, None]:
    """Generate FileSpec objects for a file or directory.

    Creates FileSpec objects with computed MD5 checksums for each file found.
    For directories, recursively processes all files. The 'File' type is
    automatically prepended to file_types if not already present.

    Args:
        path: Path to a file or directory. If directory, all files are processed recursively.
        description: Description to apply to all generated FileSpecs.
        file_types: Either a static list of file types, or a callable that takes a Path
            and returns a list of types for that specific file. Allows dynamic type
            assignment based on file extension, content, etc.

    Yields:
        FileSpec: A specification for each file with computed checksums and metadata.

    Example:
        Static file types:
            >>> specs = FileSpec.create_filespecs("/data/images", "Images", ["Image"])

        Dynamic file types based on extension:
            >>> def get_types(path):
            ...     ext = path.suffix.lower()
            ...     return {"png": ["PNG", "Image"], ".jpg": ["JPEG", "Image"]}.get(ext, [])
            >>> specs = FileSpec.create_filespecs("/data", "Mixed files", get_types)
    """
    path = Path(path)
    file_types = file_types or []
    # Convert static list to callable for uniform handling
    file_types_fn = file_types if callable(file_types) else lambda _x: file_types

    def create_spec(file_path: Path) -> FileSpec:
        """Create a FileSpec for a single file with computed hashes."""
        hashes = hash_utils.compute_file_hashes(file_path, hashes=frozenset(["md5", "sha256"]))
        md5 = hashes["md5"][0]
        type_list = file_types_fn(file_path)
        return FileSpec(
            length=path.stat().st_size,
            md5=md5,
            description=description,
            url=file_path.as_posix(),
            # Ensure 'File' type is always included
            file_types=type_list if "File" in type_list else ["File"] + type_list,
        )

    # Handle both single files and directories (recursive)
    files = [path] if path.is_file() else [f for f in Path(path).rglob("*") if f.is_file()]
    return (create_spec(file) for file in files)

read_filespec staticmethod

read_filespec(
    path: Path | str,
) -> Generator[FileSpec, None, None]

Read FileSpec objects from a JSON Lines file.

Parses a JSONL file where each line is a JSON object representing a FileSpec. Empty lines are skipped. This is useful for batch processing pre-computed file specifications.

Parameters:

Name Type Description Default
path Path | str

Path to the .jsonl file containing FileSpec data.

required

Yields:

Name Type Description
FileSpec FileSpec

Parsed FileSpec object for each valid line.

Example

for spec in FileSpec.read_filespec("files.jsonl"): ... print(f"{spec.url}: {spec.md5}")

Source code in src/deriva_ml/core/filespec.py
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
@staticmethod
def read_filespec(path: Path | str) -> Generator[FileSpec, None, None]:
    """Read FileSpec objects from a JSON Lines file.

    Parses a JSONL file where each line is a JSON object representing a FileSpec.
    Empty lines are skipped. This is useful for batch processing pre-computed
    file specifications.

    Args:
        path: Path to the .jsonl file containing FileSpec data.

    Yields:
        FileSpec: Parsed FileSpec object for each valid line.

    Example:
        >>> for spec in FileSpec.read_filespec("files.jsonl"):
        ...     print(f"{spec.url}: {spec.md5}")
    """
    path = Path(path)
    with path.open("r", encoding="utf-8") as f:
        for line in f:
            line = line.strip()
            if not line:
                continue
            yield FileSpec(**json.loads(line))

validate_file_url classmethod

validate_file_url(url: str) -> str

Examine the provided URL. If it's a local path, convert it into a tag URL.

Parameters:

Name Type Description Default
url str

The URL to validate and potentially convert

required

Returns:

Type Description
str

The validated/converted URL

Raises:

Type Description
ValidationError

If the URL is not a file URL

Source code in src/deriva_ml/core/filespec.py
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
@field_validator("url")
@classmethod
def validate_file_url(cls, url: str) -> str:
    """Examine the provided URL. If it's a local path, convert it into a tag URL.

    Args:
        url: The URL to validate and potentially convert

    Returns:
        The validated/converted URL

    Raises:
        ValidationError: If the URL is not a file URL
    """
    url_parts = urlparse(url)
    if url_parts.scheme == "tag":
        # Already a tag URL, so just return it.
        return url
    elif (not url_parts.scheme) or url_parts.scheme == "file":
        # There is no scheme part of the URL, or it is a file URL, so it is a local file path.
        # Convert to a tag URL.
        return f"tag://{gethostname()},{date.today()}:file://{url_parts.path}"
    else:
        raise ValueError("url is not a file URL")

FileUploadState

Bases: BaseModel

Tracks the state and result of a file upload operation.

Attributes:

Name Type Description
state UploadState

Current state of the upload (success, failed, etc.).

status str

Detailed status message.

result Any

Upload result data, if any.

Source code in src/deriva_ml/core/ermrest.py
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
class FileUploadState(BaseModel):
    """Tracks the state and result of a file upload operation.

    Attributes:
        state (UploadState): Current state of the upload (success, failed, etc.).
        status (str): Detailed status message.
        result (Any): Upload result data, if any.
    """
    state: UploadState
    status: str
    result: Any

    @computed_field
    @property
    def rid(self) -> RID | None:
        return self.result and self.result["RID"]

MLAsset

Bases: BaseStrEnum

Asset type identifiers.

Defines the types of assets that can be associated with executions.

Attributes:

Name Type Description
execution_metadata str

Metadata about an execution.

execution_asset str

Asset produced by an execution.

Source code in src/deriva_ml/core/enums.py
119
120
121
122
123
124
125
126
127
128
129
130
class MLAsset(BaseStrEnum):
    """Asset type identifiers.

    Defines the types of assets that can be associated with executions.

    Attributes:
        execution_metadata (str): Metadata about an execution.
        execution_asset (str): Asset produced by an execution.
    """

    execution_metadata = "Execution_Metadata"
    execution_asset = "Execution_Asset"

MLTable

Bases: BaseStrEnum

Core ML schema table identifiers.

Defines the names of the core tables in the deriva-ml schema. These tables form the backbone of the ML workflow tracking system.

Attributes:

Name Type Description
dataset str

Dataset table for versioned data collections.

workflow str

Workflow table for computational pipeline definitions.

file str

File table for tracking individual files.

asset str

Asset table for domain-specific file types.

execution str

Execution table for workflow run tracking.

execution_execution str

Execution_Execution table for nested executions.

dataset_version str

Dataset_Version table for version history.

execution_metadata str

Execution_Metadata table for run metadata.

execution_asset str

Execution_Asset table for run outputs.

Source code in src/deriva_ml/core/enums.py
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
class MLTable(BaseStrEnum):
    """Core ML schema table identifiers.

    Defines the names of the core tables in the deriva-ml schema. These tables
    form the backbone of the ML workflow tracking system.

    Attributes:
        dataset (str): Dataset table for versioned data collections.
        workflow (str): Workflow table for computational pipeline definitions.
        file (str): File table for tracking individual files.
        asset (str): Asset table for domain-specific file types.
        execution (str): Execution table for workflow run tracking.
        execution_execution (str): Execution_Execution table for nested executions.
        dataset_version (str): Dataset_Version table for version history.
        execution_metadata (str): Execution_Metadata table for run metadata.
        execution_asset (str): Execution_Asset table for run outputs.
    """

    dataset = "Dataset"
    workflow = "Workflow"
    file = "File"
    asset = "Asset"
    execution = "Execution"
    execution_execution = "Execution_Execution"
    dataset_version = "Dataset_Version"
    execution_metadata = "Execution_Metadata"
    execution_asset = "Execution_Asset"

MLVocab

Bases: BaseStrEnum

Controlled vocabulary table identifiers.

Defines the names of controlled vocabulary tables used in DerivaML. These tables store standardized terms with descriptions and synonyms for consistent data classification across the catalog.

Attributes:

Name Type Description
dataset_type str

Dataset classification vocabulary (e.g., "Training", "Test").

workflow_type str

Workflow classification vocabulary (e.g., "Python", "Notebook").

asset_type str

Asset/file type classification vocabulary (e.g., "Image", "CSV").

asset_role str

Asset role vocabulary for execution relationships (e.g., "Input", "Output").

feature_name str

Feature name vocabulary for ML feature definitions.

Source code in src/deriva_ml/core/enums.py
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
class MLVocab(BaseStrEnum):
    """Controlled vocabulary table identifiers.

    Defines the names of controlled vocabulary tables used in DerivaML. These tables
    store standardized terms with descriptions and synonyms for consistent data
    classification across the catalog.

    Attributes:
        dataset_type (str): Dataset classification vocabulary (e.g., "Training", "Test").
        workflow_type (str): Workflow classification vocabulary (e.g., "Python", "Notebook").
        asset_type (str): Asset/file type classification vocabulary (e.g., "Image", "CSV").
        asset_role (str): Asset role vocabulary for execution relationships (e.g., "Input", "Output").
        feature_name (str): Feature name vocabulary for ML feature definitions.
    """

    dataset_type = "Dataset_Type"
    workflow_type = "Workflow_Type"
    asset_type = "Asset_Type"
    asset_role = "Asset_Role"
    feature_name = "Feature_Name"

Status

Bases: BaseStrEnum

Execution status values.

Represents the various states an execution can be in throughout its lifecycle.

Attributes:

Name Type Description
initializing str

Initial setup is in progress.

created str

Execution record has been created.

pending str

Execution is queued.

running str

Execution is in progress.

aborted str

Execution was manually stopped.

completed str

Execution finished successfully.

failed str

Execution encountered an error.

Source code in src/deriva_ml/core/enums.py
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
class Status(BaseStrEnum):
    """Execution status values.

    Represents the various states an execution can be in throughout its lifecycle.

    Attributes:
        initializing (str): Initial setup is in progress.
        created (str): Execution record has been created.
        pending (str): Execution is queued.
        running (str): Execution is in progress.
        aborted (str): Execution was manually stopped.
        completed (str): Execution finished successfully.
        failed (str): Execution encountered an error.
    """

    initializing = "Initializing"
    created = "Created"
    pending = "Pending"
    running = "Running"
    aborted = "Aborted"
    completed = "Completed"
    failed = "Failed"

UploadCallback

Bases: Protocol

Protocol for upload progress callbacks.

Implement this protocol to receive progress updates during file uploads. The callback is invoked with an UploadProgress object containing current upload state information.

Example

def my_callback(progress: UploadProgress) -> None: ... print(f"Uploading {progress.file_name}: {progress.percent_complete:.1f}%") ... execution.upload_execution_outputs(progress_callback=my_callback)

Source code in src/deriva_ml/core/ermrest.py
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
class UploadCallback(Protocol):
    """Protocol for upload progress callbacks.

    Implement this protocol to receive progress updates during file uploads.
    The callback is invoked with an UploadProgress object containing current
    upload state information.

    Example:
        >>> def my_callback(progress: UploadProgress) -> None:
        ...     print(f"Uploading {progress.file_name}: {progress.percent_complete:.1f}%")
        ...
        >>> execution.upload_execution_outputs(progress_callback=my_callback)
    """
    def __call__(self, progress: UploadProgress) -> None:
        """Called with upload progress information.

        Args:
            progress: Current upload progress state.
        """
        ...

__call__

__call__(
    progress: UploadProgress,
) -> None

Called with upload progress information.

Parameters:

Name Type Description Default
progress UploadProgress

Current upload progress state.

required
Source code in src/deriva_ml/core/ermrest.py
176
177
178
179
180
181
182
def __call__(self, progress: UploadProgress) -> None:
    """Called with upload progress information.

    Args:
        progress: Current upload progress state.
    """
    ...

UploadProgress dataclass

Progress information for file uploads.

This dataclass is passed to upload callbacks to report progress during file upload operations.

Attributes:

Name Type Description
file_path str

Path to the file being uploaded.

file_name str

Name of the file being uploaded.

bytes_completed int

Number of bytes uploaded so far.

bytes_total int

Total number of bytes to upload.

percent_complete float

Percentage of upload completed (0-100).

phase str

Current phase of the upload operation.

message str

Human-readable status message.

Source code in src/deriva_ml/core/ermrest.py
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
@dataclass
class UploadProgress:
    """Progress information for file uploads.

    This dataclass is passed to upload callbacks to report progress during
    file upload operations.

    Attributes:
        file_path: Path to the file being uploaded.
        file_name: Name of the file being uploaded.
        bytes_completed: Number of bytes uploaded so far.
        bytes_total: Total number of bytes to upload.
        percent_complete: Percentage of upload completed (0-100).
        phase: Current phase of the upload operation.
        message: Human-readable status message.
    """
    file_path: str = ""
    file_name: str = ""
    bytes_completed: int = 0
    bytes_total: int = 0
    percent_complete: float = 0.0
    phase: str = ""
    message: str = ""

UploadState

Bases: Enum

File upload operation states.

Represents the various states a file upload operation can be in, from initiation to completion.

Attributes:

Name Type Description
success int

Upload completed successfully.

failed int

Upload failed.

pending int

Upload is queued.

running int

Upload is in progress.

paused int

Upload is temporarily paused.

aborted int

Upload was aborted.

cancelled int

Upload was cancelled.

timeout int

Upload timed out.

Source code in src/deriva_ml/core/enums.py
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
class UploadState(Enum):
    """File upload operation states.

    Represents the various states a file upload operation can be in, from initiation to completion.

    Attributes:
        success (int): Upload completed successfully.
        failed (int): Upload failed.
        pending (int): Upload is queued.
        running (int): Upload is in progress.
        paused (int): Upload is temporarily paused.
        aborted (int): Upload was aborted.
        cancelled (int): Upload was cancelled.
        timeout (int): Upload timed out.
    """

    success = 0
    failed = 1
    pending = 2
    running = 3
    paused = 4
    aborted = 5
    cancelled = 6
    timeout = 7

VocabularyTerm

Bases: BaseModel

Represents a term in a controlled vocabulary.

A vocabulary term is a standardized entry in a controlled vocabulary table. Each term has a primary name, optional synonyms, and identifiers for cross-referencing.

Attributes:

Name Type Description
name str

Primary name of the term.

synonyms list[str] | None

Alternative names for the term.

id str

CURIE (Compact URI) identifier.

uri str

Full URI for the term.

description str

Explanation of the term's meaning.

rid str

Resource identifier in the catalog.

Example

term = VocabularyTerm( ... Name="epithelial", ... Synonyms=["epithelium"], ... ID="tissue:0001", ... URI="http://example.org/tissue/0001", ... Description="Epithelial tissue type", ... RID="1-abc123" ... )

Source code in src/deriva_ml/core/ermrest.py
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
class VocabularyTerm(BaseModel):
    """Represents a term in a controlled vocabulary.

    A vocabulary term is a standardized entry in a controlled vocabulary table. Each term has
    a primary name, optional synonyms, and identifiers for cross-referencing.

    Attributes:
        name (str): Primary name of the term.
        synonyms (list[str] | None): Alternative names for the term.
        id (str): CURIE (Compact URI) identifier.
        uri (str): Full URI for the term.
        description (str): Explanation of the term's meaning.
        rid (str): Resource identifier in the catalog.

    Example:
        >>> term = VocabularyTerm(
        ...     Name="epithelial",
        ...     Synonyms=["epithelium"],
        ...     ID="tissue:0001",
        ...     URI="http://example.org/tissue/0001",
        ...     Description="Epithelial tissue type",
        ...     RID="1-abc123"
        ... )
    """
    _name: str = PrivateAttr()
    _synonyms: list[str] | None = PrivateAttr()
    _description: str = PrivateAttr()
    id: str = Field(validation_alias=AliasChoices("ID", "id"))
    uri: str = Field(validation_alias=AliasChoices("URI", "uri"))
    rid: str = Field(validation_alias=AliasChoices("RID", "rid"))

    def __init__(self, **data):
        # Extract fields that will be private attrs before calling super
        name = data.pop("Name", None) or data.pop("name", None)
        synonyms = data.pop("Synonyms", None) or data.pop("synonyms", None)
        description = data.pop("Description", None) or data.pop("description", None)
        super().__init__(**data)
        self._name = name
        self._synonyms = synonyms
        self._description = description

    @property
    def name(self) -> str:
        """Primary name of the term."""
        return self._name

    @property
    def synonyms(self) -> tuple[str, ...]:
        """Alternative names for the term (immutable)."""
        return tuple(self._synonyms or [])

    @property
    def description(self) -> str:
        """Explanation of the term's meaning."""
        return self._description

    class Config:
        extra = "ignore"

description property

description: str

Explanation of the term's meaning.

name property

name: str

Primary name of the term.

synonyms property

synonyms: tuple[str, ...]

Alternative names for the term (immutable).

VocabularyTermHandle

Bases: VocabularyTerm

A VocabularyTerm with methods to modify it in the catalog.

This class extends VocabularyTerm to provide mutable access to vocabulary terms. Changes made through property setters are persisted to the catalog.

The synonyms property returns a tuple (immutable) to prevent accidental modification without catalog update. To modify synonyms, assign a new tuple/list to the property.

Example

term = ml.lookup_term("Dataset_Type", "Training") term.description = "Data used for model training" term.synonyms = ("Train", "TrainingData") term.delete()

Source code in src/deriva_ml/core/ermrest.py
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
class VocabularyTermHandle(VocabularyTerm):
    """A VocabularyTerm with methods to modify it in the catalog.

    This class extends VocabularyTerm to provide mutable access to vocabulary
    terms. Changes made through property setters are persisted to the catalog.

    The `synonyms` property returns a tuple (immutable) to prevent accidental
    modification without catalog update. To modify synonyms, assign a new
    tuple/list to the property.

    Attributes:
        Inherits all attributes from VocabularyTerm.

    Example:
        >>> term = ml.lookup_term("Dataset_Type", "Training")
        >>> term.description = "Data used for model training"
        >>> term.synonyms = ("Train", "TrainingData")
        >>> term.delete()
    """

    _ml: Any = PrivateAttr()
    _table: str = PrivateAttr()

    def __init__(self, ml: Any, table: str, **data):
        """Initialize a VocabularyTermHandle.

        Args:
            ml: DerivaML instance for catalog operations.
            table: Name of the vocabulary table containing this term.
            **data: Term data (Name, Synonyms, Description, ID, URI, RID).
        """
        super().__init__(**data)
        self._ml = ml
        self._table = table

    @property
    def description(self) -> str:
        """Explanation of the term's meaning."""
        return self._description

    @description.setter
    def description(self, value: str) -> None:
        """Update the term's description in the catalog.

        Args:
            value: New description for the term.
        """
        self._ml._update_term_description(self._table, self.name, value)
        self._description = value

    @property
    def synonyms(self) -> tuple[str, ...]:
        """Alternative names for the term (immutable).

        Returns a tuple to prevent accidental modification without catalog update.
        To modify synonyms, assign a new tuple/list to this property.
        """
        return tuple(self._synonyms or [])

    @synonyms.setter
    def synonyms(self, value: list[str] | tuple[str, ...]) -> None:
        """Replace all synonyms for this term in the catalog.

        Args:
            value: New list of synonyms (replaces all existing synonyms).
        """
        new_synonyms = list(value)
        self._ml._update_term_synonyms(self._table, self.name, new_synonyms)
        self._synonyms = new_synonyms

    def delete(self) -> None:
        """Delete this term from the vocabulary.

        Raises:
            DerivaMLException: If the term is currently in use by other records.
        """
        self._ml.delete_term(self._table, self.name)

description property writable

description: str

Explanation of the term's meaning.

name property

name: str

Primary name of the term.

synonyms property writable

synonyms: tuple[str, ...]

Alternative names for the term (immutable).

Returns a tuple to prevent accidental modification without catalog update. To modify synonyms, assign a new tuple/list to this property.

__init__

__init__(ml: Any, table: str, **data)

Initialize a VocabularyTermHandle.

Parameters:

Name Type Description Default
ml Any

DerivaML instance for catalog operations.

required
table str

Name of the vocabulary table containing this term.

required
**data

Term data (Name, Synonyms, Description, ID, URI, RID).

{}
Source code in src/deriva_ml/core/ermrest.py
268
269
270
271
272
273
274
275
276
277
278
def __init__(self, ml: Any, table: str, **data):
    """Initialize a VocabularyTermHandle.

    Args:
        ml: DerivaML instance for catalog operations.
        table: Name of the vocabulary table containing this term.
        **data: Term data (Name, Synonyms, Description, ID, URI, RID).
    """
    super().__init__(**data)
    self._ml = ml
    self._table = table

delete

delete() -> None

Delete this term from the vocabulary.

Raises:

Type Description
DerivaMLException

If the term is currently in use by other records.

Source code in src/deriva_ml/core/ermrest.py
315
316
317
318
319
320
321
def delete(self) -> None:
    """Delete this term from the vocabulary.

    Raises:
        DerivaMLException: If the term is currently in use by other records.
    """
    self._ml.delete_term(self._table, self.name)

get_domain_schemas

get_domain_schemas(
    all_schemas: set[str] | list[str],
    ml_schema: str = ML_SCHEMA,
) -> frozenset[str]

Return all domain schemas from a collection of schema names.

Filters out system schemas (public, www, WWW) and the ML schema to return only user-defined domain schemas.

Parameters:

Name Type Description Default
all_schemas set[str] | list[str]

Collection of schema names to filter.

required
ml_schema str

Name of the ML schema to exclude (default: 'deriva-ml').

ML_SCHEMA

Returns:

Type Description
frozenset[str]

Frozen set of domain schema names.

Example

get_domain_schemas(["public", "deriva-ml", "my_project", "www"]) frozenset({'my_project'})

Source code in src/deriva_ml/core/constants.py
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
def get_domain_schemas(all_schemas: set[str] | list[str], ml_schema: str = ML_SCHEMA) -> frozenset[str]:
    """Return all domain schemas from a collection of schema names.

    Filters out system schemas (public, www, WWW) and the ML schema to return
    only user-defined domain schemas.

    Args:
        all_schemas: Collection of schema names to filter.
        ml_schema: Name of the ML schema to exclude (default: 'deriva-ml').

    Returns:
        Frozen set of domain schema names.

    Example:
        >>> get_domain_schemas(["public", "deriva-ml", "my_project", "www"])
        frozenset({'my_project'})
    """
    return frozenset(s for s in all_schemas if not is_system_schema(s, ml_schema))

is_system_schema

is_system_schema(
    schema_name: str,
    ml_schema: str = ML_SCHEMA,
) -> bool

Check if a schema is a system or ML schema (not a domain schema).

System schemas are Deriva infrastructure schemas (public, www, WWW) and the ML schema (deriva-ml by default). Domain schemas are user-defined schemas containing business logic tables.

Parameters:

Name Type Description Default
schema_name str

Name of the schema to check.

required
ml_schema str

Name of the ML schema (default: 'deriva-ml').

ML_SCHEMA

Returns:

Type Description
bool

True if the schema is a system or ML schema, False if it's a domain schema.

Example

is_system_schema("public") True is_system_schema("deriva-ml") True is_system_schema("my_project") False

Source code in src/deriva_ml/core/constants.py
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
def is_system_schema(schema_name: str, ml_schema: str = ML_SCHEMA) -> bool:
    """Check if a schema is a system or ML schema (not a domain schema).

    System schemas are Deriva infrastructure schemas (public, www, WWW) and the
    ML schema (deriva-ml by default). Domain schemas are user-defined schemas
    containing business logic tables.

    Args:
        schema_name: Name of the schema to check.
        ml_schema: Name of the ML schema (default: 'deriva-ml').

    Returns:
        True if the schema is a system or ML schema, False if it's a domain schema.

    Example:
        >>> is_system_schema("public")
        True
        >>> is_system_schema("deriva-ml")
        True
        >>> is_system_schema("my_project")
        False
    """
    return schema_name.lower() in {s.lower() for s in SYSTEM_SCHEMAS} or schema_name == ml_schema