library.pipeline.analysis.pipelines_analysis module¶

class library.pipeline.analysis.pipelines_analysis.PipelinesAnalysis(pipelines: dict[str, dict[str, Pipeline]])[source]¶

Bases: object

lime_feature_importance(save_plots: bool = False, save_path: str = None)[source]¶

Computes and plots LIME-based feature importances for ensembled models in the current phase. Generates barplots of the top contributing features for a single sample.

Parameters:

save_plots (bool, optional) – Whether to save the generated LIME plots to disk (default is False).
save_path (str, optional) – Directory path where plots should be saved if save_plots is True.

Returns:

Dictionary mapping each pipeline to its LIME feature importance DataFrame.

Return type:

dict

plot_confusion_matrix(save_plots: bool = False, save_path: str = None)[source]¶

Plots both absolute and relative confusion matrices for all models in the current phase.

For each applicable model, this function computes and displays: - An absolute confusion matrix (raw counts). - A relative confusion matrix (normalized by actual class totals, in %).

Conditions such as model exclusions, phase-specific logic, and baseline filtering are handled internally.

Parameters:

save_plots (bool, optional) – Whether to save the generated plot to disk. Default is False.
save_path (str, optional) – Path to the directory where plots will be saved (if save_plots is True).

Returns:

residuals (dict) – Dictionary mapping each pipeline to its residuals (misclassified examples).
confusion_matrices (dict) – Dictionary mapping each model name to its absolute and relative confusion matrices.

plot_cross_model_comparison(metrics: list[str] = None, cols: int = 2, save_plots: bool = False, save_path: str = None)[source]¶

Plots a comparison of classification metrics across different models for the current phase. Generates subplots for each selected metric and optionally saves the result.

Parameters:

metrics (list of str, optional) – List of metric names to include in the plots. If None, default classification metrics are used.
cols (int, optional) – Number of columns in the subplot grid (default is 2).
save_plots (bool, optional) – Whether to save the generated plots to disk (default is False).
save_path (str, optional) – Directory path where plots should be saved if save_plots is True.

Return type:

None

plot_feature_importance(save_plots: bool = False, save_path: str = None)[source]¶

Plots feature importance for each model in the current phase. Uses built-in importance attributes or permutation importance. Only plots top features and optionally saves the results to disk.

Parameters:

save_plots (bool, optional) – Whether to save the generated plots to disk (default is False).
save_path (str, optional) – Directory path where plots should be saved if save_plots is True.

Return type:

None

plot_intra_model_comparison(metrics: list[str] = None, save_plots: bool = False, save_path: str = None)[source]¶

Plots training vs validation/test performance for each model across selected metrics. One row per model, each with side-by-side metric trends for comparison.

Parameters:

metrics (list of str, optional) – List of metric names to plot. If None, uses default classification metrics.
save_plots (bool, optional) – Whether to save the generated plots to disk (default is False).
save_path (str, optional) – Directory path where plots should be saved if save_plots is True.

Return type:

None

plot_multiclass_reliability_diagram(save_plots: bool = False, save_path: str = None)[source]¶

Plots multiclass reliability diagrams (one-vs-rest) for ensembled or tree-based models. Each class’s calibration curve is displayed to assess probabilistic calibration quality.

Parameters:

save_plots (bool, optional) – Whether to save the generated plots to disk (default is False).
save_path (str, optional) – Directory path where plots should be saved if save_plots is True.

Return type:

None

plot_per_epoch_progress(metrics: list[str], save_plots: bool = False, save_path: str = None)[source]¶

Plots the progression of specified metrics over training epochs for a neural network model.

This function initializes a NeuralNetsPlots object for the feed-forward neural network model corresponding to the current phase, and delegates the plotting of per-epoch metric progress to that object.

Parameters:

metrics (list of str)
epochs. (List of metric names to plot over)
save_plots (bool, optional)
False. (Whether to save the generated plots. Default is)
save_path (str, optional)
True. (Directory path where plots will be saved if save_plots is)

Return type:

None

plot_residuals(save_plots: bool = False, save_path: str = None)[source]¶

Generates diagnostic plots of residuals for each model in the current phase.

For each applicable model, this function computes residuals and produces a 2x2 grid of:

Residuals vs. Predicted values
Residuals vs. Observed values
Histogram of residuals with KDE
QQ-plot of residuals to assess normality

Titles each figure as: “Residual plots for {modelName} in {phase} phase”.

Filters models according to phase, category, and exclusion rules. Saves plots if save_plots is True.

Parameters:

save_plots (bool, optional) – Whether to save the generated plots to disk. Default is False.
save_path (str, optional) – Directory path where plots should be saved (used if save_plots is True).

Return type:

None

plot_results_df(metrics: list[str], save_plots: bool = False, save_path: str = None)[source]¶

Plots general and time-based performance metrics (e.g., fit/predict time) for all models in the current phase. Displays bar charts per metric and optionally saves the results.

Parameters:

metrics (list of str) – List of metrics to visualize (e.g., accuracy, time_to_fit).
save_plots (bool, optional) – Whether to save the generated plots to disk (default is False).
save_path (str, optional) – Directory path where plots should be saved if save_plots is True.

Returns:

Concatenated dataframe with the selected metrics for all models.

Return type:

pd.DataFrame

plot_results_summary(training_metric: str, performance_metric: str, save_plots: bool = False, save_path: str = None)[source]¶

Generates a scatterplot relating a training or prediction time metric to a classification performance metric for models in the current phase.

The x-axis represents the time metric (“timeToFit” or “timeToPredict”) on a log scale, and the y-axis shows the classification performance metric, adjusted based on the phase (“pre”, “in”, or “post”) to use either validation or test evaluation.

Each point represents a model and is labeled with its name.

Parameters:

training_metric (str)
"timeToPredict". (Time metric for the x-axis. Must be either "timeToFit" or)
performance_metric (str)
metric. (Performance metric for the y-axis. Must be a valid classification)
save_plots (bool, optional)
False. (Whether to save the plot to disk. Default is)
save_path (str, optional)
True. (Directory path where plots will be saved if save_plots is)

Return type:

None

library.pipeline.analysis.pipelines_analysis module¶

Efficient Malware Classfier

Navigation

Related Topics