library.phases.phases_implementation.dataset.split.strategies.timeSeries module

class library.phases.phases_implementation.dataset.split.strategies.timeSeries.TimeSeries(dataset)[source]

Bases: Split

plot_time_splits()[source]

Plots the time splits of the dataframe

split_data(y_column: str, otherColumnsToDrop: list[str] = [], train_size: float = 0.8, validation_size: float = 0.1, test_size: float = 0.1, plot_distribution: bool = True, **kwargs) tuple[DataFrame, DataFrame, DataFrame, Series, Series, Series][source]

Splits the dataframe into training, validation and test sets for time series data

Parameters:
  • y_column (str) – The column name of the target variable

  • otherColumnsToDrop (list[str]) – The columns to drop from the dataframe (e.g: record identifiers)

  • train_size (float) – The proportion of data to use for training

  • validation_size (float) – The proportion of data to use for validation

  • test_size (float) – The proportion of data to use for testing

  • orderColumns (list[str]) – The columns to order the dataframe by (e.g., date, timestamp)

  • plot_distribution (bool) – Whether to plot the distribution of the features

Returns:

  • X_train (pd.DataFrame) – The training set features

  • X_val (pd.DataFrame) – The validation set features

  • X_test (pd.DataFrame) – The test set features

  • y_train (pd.Series) – The training set target

  • y_val (pd.Series) – The validation set target

  • y_test (pd.Series) – The test set target