Skip to content

ExecutionConfiguration class

Configuration management for DerivaML executions.

This module provides functionality for configuring and managing execution parameters in DerivaML. It includes:

  • ExecutionConfiguration class: Core class for execution settings
  • Parameter validation: Handles JSON and file-based parameters
  • Dataset specifications: Manages dataset versions and materialization
  • Asset management: Tracks required input files

The module supports both direct parameter specification and JSON-based configuration files.

Typical usage example

config = ExecutionConfiguration( ... workflow="analysis_workflow", ... datasets=[DatasetSpec(rid="1-abc123", version="1.0.0")], ... parameters={"threshold": 0.5}, ... description="Process sample data" ... ) execution = ml.create_execution(config)

ExecutionConfiguration

Bases: BaseModel

Configuration for a DerivaML execution.

Defines the complete configuration for a computational or manual process in DerivaML, including required datasets, input assets, workflow definition, and parameters.

Attributes:

Name Type Description
datasets list[DatasetSpec]

Dataset specifications, each containing: - rid: Dataset Resource Identifier - version: Version to use - materialize: Whether to extract dataset contents

assets list[RID]

Resource Identifiers of required input assets.

workflow RID | Workflow

Workflow definition or its Resource Identifier.

parameters dict[str, Any] | Path

Execution parameters, either as: - Dictionary of parameter values - Path to JSON file containing parameters

description str

Description of execution purpose (supports Markdown).

argv list[str]

Command line arguments used to start execution.

Example

config = ExecutionConfiguration( ... workflow=Workflow.create_workflow("analysis", "python_script"), ... datasets=[ ... DatasetSpec(rid="1-abc123", version="1.0.0", materialize=True) ... ], ... parameters={"threshold": 0.5, "max_iterations": 100}, ... description="Process RNA sequence data" ... )

Source code in src/deriva_ml/execution/execution_configuration.py
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
class ExecutionConfiguration(BaseModel):
    """Configuration for a DerivaML execution.

    Defines the complete configuration for a computational or manual process in DerivaML,
    including required datasets, input assets, workflow definition, and parameters.

    Attributes:
        datasets (list[DatasetSpec]): Dataset specifications, each containing:
            - rid: Dataset Resource Identifier
            - version: Version to use
            - materialize: Whether to extract dataset contents
        assets (list[RID]): Resource Identifiers of required input assets.
        workflow (RID | Workflow): Workflow definition or its Resource Identifier.
        parameters (dict[str, Any] | Path): Execution parameters, either as:
            - Dictionary of parameter values
            - Path to JSON file containing parameters
        description (str): Description of execution purpose (supports Markdown).
        argv (list[str]): Command line arguments used to start execution.

    Example:
        >>> config = ExecutionConfiguration(
        ...     workflow=Workflow.create_workflow("analysis", "python_script"),
        ...     datasets=[
        ...         DatasetSpec(rid="1-abc123", version="1.0.0", materialize=True)
        ...     ],
        ...     parameters={"threshold": 0.5, "max_iterations": 100},
        ...     description="Process RNA sequence data"
        ... )
    """

    datasets: list[DatasetSpec] = []
    assets: list[RID] = []
    workflow: RID | Workflow
    parameters: dict[str, Any] | Path = {}
    description: str = ""
    argv: list[str] = Field(default_factory=lambda: sys.argv)

    model_config = ConfigDict(arbitrary_types_allowed=True)

    @field_validator("parameters", mode="before")
    @classmethod
    def validate_parameters(cls, value: Any) -> Any:
        """Validates and loads execution parameters.

        If value is a file path, loads and parses it as JSON. Otherwise, returns
        the value as is.

        Args:
            value: Parameter value to validate, either:
                - Dictionary of parameters
                - Path to JSON file
                - String path to JSON file

        Returns:
            dict[str, Any]: Validated parameter dictionary.

        Raises:
            ValueError: If JSON file is invalid or cannot be read.
            FileNotFoundError: If parameter file doesn't exist.

        Example:
            >>> config = ExecutionConfiguration(parameters="params.json")
            >>> print(config.parameters)  # Contents of params.json as dict
        """
        if isinstance(value, str) or isinstance(value, Path):
            with Path(value).open("r") as f:
                return json.load(f)
        else:
            return value

    @field_validator("workflow", mode="before")
    @classmethod
    def validate_workflow(cls, value: Any) -> Any:
        """Validates workflow specification.

        Args:
            value: Workflow value to validate (RID or Workflow object).

        Returns:
            RID | Workflow: Validated workflow specification.
        """
        return value

    @staticmethod
    def load_configuration(path: Path) -> ExecutionConfiguration:
        """Creates an ExecutionConfiguration from a JSON file.

        Loads and parses a JSON configuration file into an ExecutionConfiguration
        instance. The file should contain a valid configuration specification.

        Args:
            path: Path to JSON configuration file.

        Returns:
            ExecutionConfiguration: Loaded configuration instance.

        Raises:
            ValueError: If JSON file is invalid or missing required fields.
            FileNotFoundError: If configuration file doesn't exist.

        Example:
            >>> config = ExecutionConfiguration.load_configuration(Path("config.json"))
            >>> print(f"Workflow: {config.workflow}")
            >>> print(f"Datasets: {len(config.datasets)}")
        """
        with Path(path).open() as fd:
            config = json.load(fd)
        return ExecutionConfiguration.model_validate(config)

load_configuration staticmethod

load_configuration(
    path: Path,
) -> ExecutionConfiguration

Creates an ExecutionConfiguration from a JSON file.

Loads and parses a JSON configuration file into an ExecutionConfiguration instance. The file should contain a valid configuration specification.

Parameters:

Name Type Description Default
path Path

Path to JSON configuration file.

required

Returns:

Name Type Description
ExecutionConfiguration ExecutionConfiguration

Loaded configuration instance.

Raises:

Type Description
ValueError

If JSON file is invalid or missing required fields.

FileNotFoundError

If configuration file doesn't exist.

Example

config = ExecutionConfiguration.load_configuration(Path("config.json")) print(f"Workflow: {config.workflow}") print(f"Datasets: {len(config.datasets)}")

Source code in src/deriva_ml/execution/execution_configuration.py
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
@staticmethod
def load_configuration(path: Path) -> ExecutionConfiguration:
    """Creates an ExecutionConfiguration from a JSON file.

    Loads and parses a JSON configuration file into an ExecutionConfiguration
    instance. The file should contain a valid configuration specification.

    Args:
        path: Path to JSON configuration file.

    Returns:
        ExecutionConfiguration: Loaded configuration instance.

    Raises:
        ValueError: If JSON file is invalid or missing required fields.
        FileNotFoundError: If configuration file doesn't exist.

    Example:
        >>> config = ExecutionConfiguration.load_configuration(Path("config.json"))
        >>> print(f"Workflow: {config.workflow}")
        >>> print(f"Datasets: {len(config.datasets)}")
    """
    with Path(path).open() as fd:
        config = json.load(fd)
    return ExecutionConfiguration.model_validate(config)

validate_parameters classmethod

validate_parameters(value: Any) -> Any

Validates and loads execution parameters.

If value is a file path, loads and parses it as JSON. Otherwise, returns the value as is.

Parameters:

Name Type Description Default
value Any

Parameter value to validate, either: - Dictionary of parameters - Path to JSON file - String path to JSON file

required

Returns:

Type Description
Any

dict[str, Any]: Validated parameter dictionary.

Raises:

Type Description
ValueError

If JSON file is invalid or cannot be read.

FileNotFoundError

If parameter file doesn't exist.

Example

config = ExecutionConfiguration(parameters="params.json") print(config.parameters) # Contents of params.json as dict

Source code in src/deriva_ml/execution/execution_configuration.py
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
@field_validator("parameters", mode="before")
@classmethod
def validate_parameters(cls, value: Any) -> Any:
    """Validates and loads execution parameters.

    If value is a file path, loads and parses it as JSON. Otherwise, returns
    the value as is.

    Args:
        value: Parameter value to validate, either:
            - Dictionary of parameters
            - Path to JSON file
            - String path to JSON file

    Returns:
        dict[str, Any]: Validated parameter dictionary.

    Raises:
        ValueError: If JSON file is invalid or cannot be read.
        FileNotFoundError: If parameter file doesn't exist.

    Example:
        >>> config = ExecutionConfiguration(parameters="params.json")
        >>> print(config.parameters)  # Contents of params.json as dict
    """
    if isinstance(value, str) or isinstance(value, Path):
        with Path(value).open("r") as f:
            return json.load(f)
    else:
        return value

validate_workflow classmethod

validate_workflow(value: Any) -> Any

Validates workflow specification.

Parameters:

Name Type Description Default
value Any

Workflow value to validate (RID or Workflow object).

required

Returns:

Type Description
Any

RID | Workflow: Validated workflow specification.

Source code in src/deriva_ml/execution/execution_configuration.py
107
108
109
110
111
112
113
114
115
116
117
118
@field_validator("workflow", mode="before")
@classmethod
def validate_workflow(cls, value: Any) -> Any:
    """Validates workflow specification.

    Args:
        value: Workflow value to validate (RID or Workflow object).

    Returns:
        RID | Workflow: Validated workflow specification.
    """
    return value