library.pipeline.pipeline_manager module¶

class library.pipeline.pipeline_manager.PipelineManager(pipelines: dict[str, dict[str, Pipeline]], serializer_type: str = 'joblib', variables: dict = None)[source]¶

Bases: object

Trains all pipelines. Evaluates all pipelines

all_pipelines_execute(methodName: str, verbose: bool = False, exclude_category: str = '', exclude_pipeline_names: list[str] = [], **kwargs)[source]¶

Executes a method for all pipelines using threading for parallelization. Method name can include dot notation for nested attributes (e.g. “model.fit”)

Note for verbose: - If u dont see a given pipeline in the results, it is because it has already been processed (its a copy of another pipeline)

Parameters:

methodName (str) – The method to execute. As per defined in the phases implementation.
verbose (bool) – Whether to print to stdout the results returned by the method.
exclude_category (str) – The category to exclude from the execution. (either baseline or not_baseline)
exclude_pipeline_names (list[str]) – The pipeline names to exclude from the execution.
**kwargs (dict) – Additional keyword arguments that are method-specific.

Returns:

results – The results of the execution.

Return type:

dict

create_pipeline_divergence(category: str, pipelineName: str, print_results: bool = False) → Pipeline[source]¶

Originally all pipelines point to the same object. This function creates a copy at the moment and creates a new indepedent pipeline object. Changes to this pipeline now only affect this copy.

Parameters:

category (str) – The category to create a divergence for.
pipelineName (str) – The pipeline name to create a divergence for.
print_results (bool) – Whether to print the results.

Returns:

newPipeline – The new pipeline object.

Return type:

Pipeline

deserialize_models(models_to_deserialize: dict[str, str])[source]¶

Deserializes the models.

Parameters:: models_to_deserialize (dict[str, str]) – The models to deserialize.

deserialize_pipelines(pipelines_to_deserialize: dict[str, str]) → None[source]¶

Deserializes the pipelines.

Parameters:: pipelines_to_deserialize (dict[str, str]) – The pipelines to deserialize.

evaluate_store_final_models()[source]¶

Evaluates and stores the final models (post-tuning).

Return type:: None

fit_final_models()[source]¶

Fits the final models (post-tuning).

Return type:: None

property pipeline_state¶

select_best_performing_model(metric: str)[source]¶

Selects the best performing model based on the classification report

Parameters:

metric (str) – The metric to use to select the best performing model.

Returns:

best_model_name (str) – The name of the best performing model.
best_score (float) – The score of the best performing model.

serialize_models(models_to_serialize: list[str]) → None[source]¶

Out of all the models in all the pipelines, we select the ones we want to serialize only.

Parameters:: models_to_serialize (list[str]) – The models to serialize.

serialize_pipelines(pipelines_to_serialize: list[str]) → None[source]¶

Serializes the pipelines.

Parameters:: pipelines_to_serialize (list[str]) – The pipelines to serialize.

library.pipeline.pipeline_manager module¶

Efficient Malware Classfier

Navigation

Related Topics