library.phases.phases_implementation.dataset.split.strategies.timeSeries module¶

class library.phases.phases_implementation.dataset.split.strategies.timeSeries.TimeSeries(dataset)[source]¶

Bases: Split

plot_time_splits()[source]¶: Plots the time splits of the dataframe

split_data(y_column: str, otherColumnsToDrop: list[str] = [], train_size: float = 0.8, validation_size: float = 0.1, test_size: float = 0.1, plot_distribution: bool = True, **kwargs) → tuple[DataFrame, DataFrame, DataFrame, Series, Series, Series][source]¶

Splits the dataframe into training, validation and test sets for time series data

Parameters:

y_column (str) – The column name of the target variable
otherColumnsToDrop (list[str]) – The columns to drop from the dataframe (e.g: record identifiers)
train_size (float) – The proportion of data to use for training
validation_size (float) – The proportion of data to use for validation
test_size (float) – The proportion of data to use for testing
orderColumns (list[str]) – The columns to order the dataframe by (e.g., date, timestamp)
plot_distribution (bool) – Whether to plot the distribution of the features

Returns:

X_train (pd.DataFrame) – The training set features
X_val (pd.DataFrame) – The validation set features
X_test (pd.DataFrame) – The test set features
y_train (pd.Series) – The training set target
y_val (pd.Series) – The validation set target
y_test (pd.Series) – The test set target

library.phases.phases_implementation.dataset.split.strategies.timeSeries module¶

Efficient Malware Classfier

Navigation

Related Topics