library.phases.phases_implementation.dataset.split.strategies.base module¶

class library.phases.phases_implementation.dataset.split.strategies.base.Split(dataset)[source]¶

Bases: ABC

plot_per_set_distribution(features: list[str], save_plots: bool = False, save_path: str = None)[source]¶

Plots the distribution of the features for the training, validation and test sets. This is going to be meaningful for checking the similarity in statistical distributions between the sets. Note: for high-dimesionality dataset this is going to be computationally expensive.

Parameters:¶

features: list[str]: The names of the features to plot
save_plots: bool: Whether to save the plots
save_path: str: The path to save the plots

abstractmethod split_data(y_column: str, otherColumnsToDrop: list[str] = [], train_size: float = 0.8, validation_size: float = 0.1, test_size: float = 0.1, plot_distribution: bool = True, **kwargs)[source]¶