library.phases.phases_implementation.EDA.EDA module

class library.phases.phases_implementation.EDA.EDA.EDA(dataset: Dataset)[source]

Bases: object

We will be using ‘composition’ desing pattern to create plots from the dataframe object that is an instance of the Dataset class This design pattern allows for two classes to be able to share data (e.g: dataset object)

barplot_bivariate(features: list[str], target: str, n_cols: int = 3)[source]

Plots bar plots for each specified feature against the target variable.

The function adjusts the figure size for better visibility and optimizes the x-axis ticks, including handling interval-type features by converting them to strings. Plots are arranged in a grid layout based on the specified number of columns.

Parameters:
  • features (list of str) – List of feature names to plot on the x-axis.

  • target (str) – The target variable name to plot on the y-axis.

  • n_cols (int, optional (default=3)) – Number of columns in the subplot grid.

Return type:

None

count_boxplot_descriptive(features: list[str])[source]

Plots the distribution histogram, boxplot, and descriptive statistics summary for each specified feature.

Parameters:

features (list of str) – List of feature names to analyze and plot.

Return type:

None

lineplot_bivariate(features: list[str], target: str, n_cols: int = 3)[source]

Plots the line plot of a feature against the target with maximized x-axis ticks and stretched figure size.

plot_categorical_distributions(features: list[str], n_cols: int = 2)[source]

Plots the distributions of specified categorical features as count plots.

Parameters:
  • features (list of str) – List of categorical feature names to plot.

  • n_cols (int, optional) – Number of columns for the subplot grid. Default is 2.

Return type:

None

plot_correlation_matrix(size: str = 'small', numerical_df: DataFrame = None, title: str = '', save_plots: bool = False, save_path: str = '', **kwargs) None[source]

Plots the correlation matrix of the dataframe

Parameters:

size (str) – The size of the plot. Taken on [“s”, “m”, “l”, “auto”]

Return type:

None

scatterplot_bivariate(features: list[str], target: str, n_cols: int = 3)[source]

Plots line plots for each specified feature against the target variable.

The plots have an expanded figure size and enhanced x-axis ticks for better readability. If multiple features are provided, plots are arranged in a grid with the specified number of columns.

Parameters:
  • features (list of str) – List of feature names to plot on the x-axis.

  • target (str) – The target variable name to plot on the y-axis.

  • n_cols (int, optional (default=3)) – Number of columns in the subplot grid.

Return type:

None