library.phases.phases_implementation.dataset.split.strategies.noTimeSeries module¶
- class library.phases.phases_implementation.dataset.split.strategies.noTimeSeries.NoTimeSeries(dataset)[source]¶
Bases:
Split
- asses_split_classifier(p: float, step: float, upper_bound: float = 0.5, save_plots: bool = False, save_path: str = None) DataFrame [source]¶
Assesses the split of the dataframe for classification tasks.
- Parameters:
p (float) – The percentage of the dataframe to split
step (float) – The step size for the split
upper_bound (float) – The upper bound for the split
plot (bool) – If True, the split assessment will be plotted
- Returns:
df_split_assesment – A dataframe with the split assessment
- Return type:
pd.DataFrame
- split_data(y_column: str, otherColumnsToDrop: list[str] = [], train_size: float = 0.8, validation_size: float = 0.1, test_size: float = 0.1, save_plots: bool = False, save_path: str = None) None [source]¶
Splits the dataframe into training, validation and test sets
- Parameters:
y_column (str) – The column name of the target variable
otherColumnsToDrop (list[str]) – The columns to drop from the dataframe (e.g: record identifiers)
train_size (float) – The size of the training set
validation_size (float) – The size of the validation set
test_size (float) – The size of the test set
plot_distribution (bool) – Whether to plot the distribution of the features
- Returns:
X_train (pd.DataFrame) – The training set
X_val (pd.DataFrame) – The validation set
X_test (pd.DataFrame) – The test set
y_train (pd.Series) – The training set
y_val (pd.Series) – The validation set
y_test (pd.Series) – The test set