library.phases.phases_implementation.dataset.split.strategies.noTimeSeries module¶

class library.phases.phases_implementation.dataset.split.strategies.noTimeSeries.NoTimeSeries(dataset)[source]¶

Bases: Split

asses_split_classifier(p: float, step: float, upper_bound: float = 0.5, save_plots: bool = False, save_path: str = None) → DataFrame[source]¶

Assesses the split of the dataframe for classification tasks.

Parameters:

p (float) – The percentage of the dataframe to split
step (float) – The step size for the split
upper_bound (float) – The upper bound for the split
plot (bool) – If True, the split assessment will be plotted

Returns:

df_split_assesment – A dataframe with the split assessment

Return type:

pd.DataFrame

split_data(y_column: str, otherColumnsToDrop: list[str] = [], train_size: float = 0.8, validation_size: float = 0.1, test_size: float = 0.1, save_plots: bool = False, save_path: str = None) → None[source]¶

Splits the dataframe into training, validation and test sets

Parameters:

y_column (str) – The column name of the target variable
otherColumnsToDrop (list[str]) – The columns to drop from the dataframe (e.g: record identifiers)
train_size (float) – The size of the training set
validation_size (float) – The size of the validation set
test_size (float) – The size of the test set
plot_distribution (bool) – Whether to plot the distribution of the features

Returns:

X_train (pd.DataFrame) – The training set
X_val (pd.DataFrame) – The validation set
X_test (pd.DataFrame) – The test set
y_train (pd.Series) – The training set
y_val (pd.Series) – The validation set
y_test (pd.Series) – The test set

library.phases.phases_implementation.dataset.split.strategies.noTimeSeries module¶

Efficient Malware Classfier

Navigation

Related Topics