library.pipeline.pipeline_manager module

class library.pipeline.pipeline_manager.PipelineManager(pipelines: dict[str, dict[str, Pipeline]], serializer_type: str = 'joblib', variables: dict = None)[source]

Bases: object

Trains all pipelines. Evaluates all pipelines

all_pipelines_execute(methodName: str, verbose: bool = False, exclude_category: str = '', exclude_pipeline_names: list[str] = [], **kwargs)[source]

Executes a method for all pipelines using threading for parallelization. Method name can include dot notation for nested attributes (e.g. “model.fit”)

Note for verbose: - If u dont see a given pipeline in the results, it is because it has already been processed (its a copy of another pipeline)

Parameters:
  • methodName (str) – The method to execute. As per defined in the phases implementation.

  • verbose (bool) – Whether to print to stdout the results returned by the method.

  • exclude_category (str) – The category to exclude from the execution. (either baseline or not_baseline)

  • exclude_pipeline_names (list[str]) – The pipeline names to exclude from the execution.

  • **kwargs (dict) – Additional keyword arguments that are method-specific.

Returns:

results – The results of the execution.

Return type:

dict

create_pipeline_divergence(category: str, pipelineName: str, print_results: bool = False) Pipeline[source]

Originally all pipelines point to the same object. This function creates a copy at the moment and creates a new indepedent pipeline object. Changes to this pipeline now only affect this copy.

Parameters:
  • category (str) – The category to create a divergence for.

  • pipelineName (str) – The pipeline name to create a divergence for.

  • print_results (bool) – Whether to print the results.

Returns:

newPipeline – The new pipeline object.

Return type:

Pipeline

deserialize_models(models_to_deserialize: dict[str, str])[source]

Deserializes the models.

Parameters:

models_to_deserialize (dict[str, str]) – The models to deserialize.

deserialize_pipelines(pipelines_to_deserialize: dict[str, str]) None[source]

Deserializes the pipelines.

Parameters:

pipelines_to_deserialize (dict[str, str]) – The pipelines to deserialize.

evaluate_store_final_models()[source]

Evaluates and stores the final models (post-tuning).

Return type:

None

fit_final_models()[source]

Fits the final models (post-tuning).

Return type:

None

property pipeline_state
select_best_performing_model(metric: str)[source]

Selects the best performing model based on the classification report

Parameters:

metric (str) – The metric to use to select the best performing model.

Returns:

  • best_model_name (str) – The name of the best performing model.

  • best_score (float) – The score of the best performing model.

serialize_models(models_to_serialize: list[str]) None[source]

Out of all the models in all the pipelines, we select the ones we want to serialize only.

Parameters:

models_to_serialize (list[str]) – The models to serialize.

serialize_pipelines(pipelines_to_serialize: list[str]) None[source]

Serializes the pipelines.

Parameters:

pipelines_to_serialize (list[str]) – The pipelines to serialize.