library.phases.phases_implementation.data_preprocessing.uncomplete_data module

class library.phases.phases_implementation.data_preprocessing.uncomplete_data.UncompleteData(dataset: Dataset)[source]

Bases: object

analyze_duplicates(save_plots: bool = False, save_path: str = None) str[source]

Report and optionally visualise duplicate rows.

Parameters:
  • save_plots (bool, default=False) – If True, a barplot of duplicate counts per column is displayed.

  • save_path (str) – The path to save the plot.

Returns:

Diagnostic string with the number of duplicate rows found.

Return type:

str

get_missing_values(placeholders: list[str] | None = None, save_plots: bool = False, save_path: str = None)[source]

Return the subset of rows that contain any missing value.

Parameters:
  • placeholders (list[str] | None) – Additional strings that should be considered NA (e.g., “N/A”, “-1”).

  • save_plots (bool, default=False) – When True, show a barplot of missing counts per column.

  • save_path (str) – The path to save the plot.

Returns:

Rows that include at least one missing value or None if the dataset is complete.

Return type:

pandas.DataFrame | None

remove_duplicates() str[source]

Removes duplicates from the dataset

Returns:

Message indicating the number of duplicates removed

Return type:

str