library.phases.phases_implementation.dataset.split.strategies.timeSeries module¶
- class library.phases.phases_implementation.dataset.split.strategies.timeSeries.TimeSeries(dataset)[source]¶
Bases:
Split
- split_data(y_column: str, otherColumnsToDrop: list[str] = [], train_size: float = 0.8, validation_size: float = 0.1, test_size: float = 0.1, plot_distribution: bool = True, **kwargs) tuple[DataFrame, DataFrame, DataFrame, Series, Series, Series] [source]¶
Splits the dataframe into training, validation and test sets for time series data
- Parameters:
y_column (str) – The column name of the target variable
otherColumnsToDrop (list[str]) – The columns to drop from the dataframe (e.g: record identifiers)
train_size (float) – The proportion of data to use for training
validation_size (float) – The proportion of data to use for validation
test_size (float) – The proportion of data to use for testing
orderColumns (list[str]) – The columns to order the dataframe by (e.g., date, timestamp)
plot_distribution (bool) – Whether to plot the distribution of the features
- Returns:
X_train (pd.DataFrame) – The training set features
X_val (pd.DataFrame) – The validation set features
X_test (pd.DataFrame) – The test set features
y_train (pd.Series) – The training set target
y_val (pd.Series) – The validation set target
y_test (pd.Series) – The test set target