library.pipeline.analysis.pipelines_analysis module¶
- class library.pipeline.analysis.pipelines_analysis.PipelinesAnalysis(pipelines: dict[str, dict[str, Pipeline]])[source]¶
Bases:
object
- lime_feature_importance(save_plots: bool = False, save_path: str = None)[source]¶
Computes and plots LIME-based feature importances for ensembled models in the current phase. Generates barplots of the top contributing features for a single sample.
- Parameters:
save_plots (bool, optional) – Whether to save the generated LIME plots to disk (default is False).
save_path (str, optional) – Directory path where plots should be saved if save_plots is True.
- Returns:
Dictionary mapping each pipeline to its LIME feature importance DataFrame.
- Return type:
dict
- plot_confusion_matrix(save_plots: bool = False, save_path: str = None)[source]¶
Plots both absolute and relative confusion matrices for all models in the current phase.
For each applicable model, this function computes and displays: - An absolute confusion matrix (raw counts). - A relative confusion matrix (normalized by actual class totals, in %).
Conditions such as model exclusions, phase-specific logic, and baseline filtering are handled internally.
- Parameters:
save_plots (bool, optional) – Whether to save the generated plot to disk. Default is False.
save_path (str, optional) – Path to the directory where plots will be saved (if save_plots is True).
- Returns:
residuals (dict) – Dictionary mapping each pipeline to its residuals (misclassified examples).
confusion_matrices (dict) – Dictionary mapping each model name to its absolute and relative confusion matrices.
- plot_cross_model_comparison(metrics: list[str] = None, cols: int = 2, save_plots: bool = False, save_path: str = None)[source]¶
Plots a comparison of classification metrics across different models for the current phase. Generates subplots for each selected metric and optionally saves the result.
- Parameters:
metrics (list of str, optional) – List of metric names to include in the plots. If None, default classification metrics are used.
cols (int, optional) – Number of columns in the subplot grid (default is 2).
save_plots (bool, optional) – Whether to save the generated plots to disk (default is False).
save_path (str, optional) – Directory path where plots should be saved if save_plots is True.
- Return type:
None
- plot_feature_importance(save_plots: bool = False, save_path: str = None)[source]¶
Plots feature importance for each model in the current phase. Uses built-in importance attributes or permutation importance. Only plots top features and optionally saves the results to disk.
- Parameters:
save_plots (bool, optional) – Whether to save the generated plots to disk (default is False).
save_path (str, optional) – Directory path where plots should be saved if save_plots is True.
- Return type:
None
- plot_intra_model_comparison(metrics: list[str] = None, save_plots: bool = False, save_path: str = None)[source]¶
Plots training vs validation/test performance for each model across selected metrics. One row per model, each with side-by-side metric trends for comparison.
- Parameters:
metrics (list of str, optional) – List of metric names to plot. If None, uses default classification metrics.
save_plots (bool, optional) – Whether to save the generated plots to disk (default is False).
save_path (str, optional) – Directory path where plots should be saved if save_plots is True.
- Return type:
None
- plot_multiclass_reliability_diagram(save_plots: bool = False, save_path: str = None)[source]¶
Plots multiclass reliability diagrams (one-vs-rest) for ensembled or tree-based models. Each class’s calibration curve is displayed to assess probabilistic calibration quality.
- Parameters:
save_plots (bool, optional) – Whether to save the generated plots to disk (default is False).
save_path (str, optional) – Directory path where plots should be saved if save_plots is True.
- Return type:
None
- plot_per_epoch_progress(metrics: list[str], save_plots: bool = False, save_path: str = None)[source]¶
Plots the progression of specified metrics over training epochs for a neural network model.
This function initializes a NeuralNetsPlots object for the feed-forward neural network model corresponding to the current phase, and delegates the plotting of per-epoch metric progress to that object.
- Parameters:
metrics (list of str)
epochs. (List of metric names to plot over)
save_plots (bool, optional)
False. (Whether to save the generated plots. Default is)
save_path (str, optional)
True. (Directory path where plots will be saved if save_plots is)
- Return type:
None
- plot_residuals(save_plots: bool = False, save_path: str = None)[source]¶
Generates diagnostic plots of residuals for each model in the current phase.
- For each applicable model, this function computes residuals and produces a 2x2 grid of:
Residuals vs. Predicted values
Residuals vs. Observed values
Histogram of residuals with KDE
QQ-plot of residuals to assess normality
Titles each figure as: “Residual plots for {modelName} in {phase} phase”.
Filters models according to phase, category, and exclusion rules. Saves plots if save_plots is True.
- Parameters:
save_plots (bool, optional) – Whether to save the generated plots to disk. Default is False.
save_path (str, optional) – Directory path where plots should be saved (used if save_plots is True).
- Return type:
None
- plot_results_df(metrics: list[str], save_plots: bool = False, save_path: str = None)[source]¶
Plots general and time-based performance metrics (e.g., fit/predict time) for all models in the current phase. Displays bar charts per metric and optionally saves the results.
- Parameters:
metrics (list of str) – List of metrics to visualize (e.g., accuracy, time_to_fit).
save_plots (bool, optional) – Whether to save the generated plots to disk (default is False).
save_path (str, optional) – Directory path where plots should be saved if save_plots is True.
- Returns:
Concatenated dataframe with the selected metrics for all models.
- Return type:
pd.DataFrame
- plot_results_summary(training_metric: str, performance_metric: str, save_plots: bool = False, save_path: str = None)[source]¶
Generates a scatterplot relating a training or prediction time metric to a classification performance metric for models in the current phase.
The x-axis represents the time metric (“timeToFit” or “timeToPredict”) on a log scale, and the y-axis shows the classification performance metric, adjusted based on the phase (“pre”, “in”, or “post”) to use either validation or test evaluation.
Each point represents a model and is labeled with its name.
- Parameters:
training_metric (str)
"timeToPredict". (Time metric for the x-axis. Must be either "timeToFit" or)
performance_metric (str)
metric. (Performance metric for the y-axis. Must be a valid classification)
save_plots (bool, optional)
False. (Whether to save the plot to disk. Default is)
save_path (str, optional)
True. (Directory path where plots will be saved if save_plots is)
- Return type:
None