library.phases.phases_implementation.EDA.EDA module¶
- class library.phases.phases_implementation.EDA.EDA.EDA(dataset: Dataset)[source]¶
Bases:
object
We will be using ‘composition’ desing pattern to create plots from the dataframe object that is an instance of the Dataset class This design pattern allows for two classes to be able to share data (e.g: dataset object)
- barplot_bivariate(features: list[str], target: str, n_cols: int = 3)[source]¶
Plots bar plots for each specified feature against the target variable.
The function adjusts the figure size for better visibility and optimizes the x-axis ticks, including handling interval-type features by converting them to strings. Plots are arranged in a grid layout based on the specified number of columns.
- Parameters:
features (list of str) – List of feature names to plot on the x-axis.
target (str) – The target variable name to plot on the y-axis.
n_cols (int, optional (default=3)) – Number of columns in the subplot grid.
- Return type:
None
- count_boxplot_descriptive(features: list[str])[source]¶
Plots the distribution histogram, boxplot, and descriptive statistics summary for each specified feature.
- Parameters:
features (list of str) – List of feature names to analyze and plot.
- Return type:
None
- lineplot_bivariate(features: list[str], target: str, n_cols: int = 3)[source]¶
Plots the line plot of a feature against the target with maximized x-axis ticks and stretched figure size.
- plot_categorical_distributions(features: list[str], n_cols: int = 2)[source]¶
Plots the distributions of specified categorical features as count plots.
- Parameters:
features (list of str) – List of categorical feature names to plot.
n_cols (int, optional) – Number of columns for the subplot grid. Default is 2.
- Return type:
None
- plot_correlation_matrix(size: str = 'small', numerical_df: DataFrame = None, title: str = '', save_plots: bool = False, save_path: str = '', **kwargs) None [source]¶
Plots the correlation matrix of the dataframe
- Parameters:
size (str) – The size of the plot. Taken on [“s”, “m”, “l”, “auto”]
- Return type:
None
- scatterplot_bivariate(features: list[str], target: str, n_cols: int = 3)[source]¶
Plots line plots for each specified feature against the target variable.
The plots have an expanded figure size and enhanced x-axis ticks for better readability. If multiple features are provided, plots are arranged in a grid with the specified number of columns.
- Parameters:
features (list of str) – List of feature names to plot on the x-axis.
target (str) – The target variable name to plot on the y-axis.
n_cols (int, optional (default=3)) – Number of columns in the subplot grid.
- Return type:
None