library.phases.phases_implementation.dataset.split.strategies.noTimeSeries module

class library.phases.phases_implementation.dataset.split.strategies.noTimeSeries.NoTimeSeries(dataset)[source]

Bases: Split

asses_split_classifier(p: float, step: float, upper_bound: float = 0.5, save_plots: bool = False, save_path: str = None) DataFrame[source]

Assesses the split of the dataframe for classification tasks.

Parameters:
  • p (float) – The percentage of the dataframe to split

  • step (float) – The step size for the split

  • upper_bound (float) – The upper bound for the split

  • plot (bool) – If True, the split assessment will be plotted

Returns:

df_split_assesment – A dataframe with the split assessment

Return type:

pd.DataFrame

split_data(y_column: str, otherColumnsToDrop: list[str] = [], train_size: float = 0.8, validation_size: float = 0.1, test_size: float = 0.1, save_plots: bool = False, save_path: str = None) None[source]

Splits the dataframe into training, validation and test sets

Parameters:
  • y_column (str) – The column name of the target variable

  • otherColumnsToDrop (list[str]) – The columns to drop from the dataframe (e.g: record identifiers)

  • train_size (float) – The size of the training set

  • validation_size (float) – The size of the validation set

  • test_size (float) – The size of the test set

  • plot_distribution (bool) – Whether to plot the distribution of the features

Returns:

  • X_train (pd.DataFrame) – The training set

  • X_val (pd.DataFrame) – The validation set

  • X_test (pd.DataFrame) – The test set

  • y_train (pd.Series) – The training set

  • y_val (pd.Series) – The validation set

  • y_test (pd.Series) – The test set