library.phases.phases_implementation.data_preprocessing.uncomplete_data module¶
- class library.phases.phases_implementation.data_preprocessing.uncomplete_data.UncompleteData(dataset: Dataset)[source]¶
Bases:
object
- analyze_duplicates(save_plots: bool = False, save_path: str = None) str [source]¶
Report and optionally visualise duplicate rows.
- Parameters:
save_plots (bool, default=False) – If True, a barplot of duplicate counts per column is displayed.
save_path (str) – The path to save the plot.
- Returns:
Diagnostic string with the number of duplicate rows found.
- Return type:
str
- get_missing_values(placeholders: list[str] | None = None, save_plots: bool = False, save_path: str = None)[source]¶
Return the subset of rows that contain any missing value.
- Parameters:
placeholders (list[str] | None) – Additional strings that should be considered NA (e.g., “N/A”, “-1”).
save_plots (bool, default=False) – When True, show a barplot of missing counts per column.
save_path (str) – The path to save the plot.
- Returns:
Rows that include at least one missing value or None if the dataset is complete.
- Return type:
pandas.DataFrame | None