The clusteror.plot Module

Plotting tools relevant for illustrating and comparing clustering results can be found in this module.

clusteror.plot.group_occurance_plot(one_dim_data, cat_label, labels, group_label, colors=None, figsize=(10, 6), bbox_to_anchor=(1.01, 1), loc=2, grid=True, show=True, filepath=None, **kwargs)[source]

Plot the distribution of a one dimensional ordinal or categorical data in a bar chart. This tool is useful to check the clustering impact in this one-dimensional sub-space.

Parameters:
  • one_dim_data (list, Pandas Series, Numpy Array, or any iterable) – A sequence of data. Each element if for an instance.
  • cat_label (str) – Field name will be used for the one dimensional data.
  • labels (list, Pandas Series, Numpy Array, or any iterable) – The segment label for each sample in one_dim_data.
  • group_label (str) – Field name will be used for the cluster ID.
  • colors (list, default None) – Colours for each category existing in this one dimensional data. Default colour scheme used if not supplied.
  • figsize (tuple) – Figure size (width, height).
  • bbox_to_anchor (tuple) – Instruction to placing the legend box relative to the axes. Details refer to Matplotlib document.
  • loc (int) – The corner of the legend box to anchor. Details refer to Matplotlib document.
  • grid (boolean, default True) – Show grid.
  • show (boolean, default True) – Show figure in pop-up windows if true. Save to files if False.
  • filepath (str) – File name to saving the plot. Must be assigned a valid filepath if show is False.
  • **kwargs (keyword arguments) – Other keyword arguemnts passed on to matplotlib.pyplot.scatter.

Note

Instances in a same cluster does not necessarily assemble together in all one dimensional sub-spaces. There can be possibly no clustering capaility for certain features. Additionally certain features play a secondary role in clustering as having less importance in field_importance in clusteror module.

clusteror.plot.hist_plot_one_dim_group_data(one_dim_data, labels, bins=11, colors=None, figsize=(10, 6), xlabel='Dimension Reduced Data', ylabel='Occurance', bbox_to_anchor=(1.01, 1), loc=2, grid=True, show=True, filepath=None, **kwargs)[source]

Plot the distribution of a one dimensional numerical data in a histogram. This tool is useful to check the clustering impact in this one-dimensional sub-space.

Parameters:
  • one_dim_data (list, Pandas Series, Numpy Array, or any iterable) – A sequence of data. Each element if for an instance.
  • labels (list, Pandas Series, Numpy Array, or any iterable) – The segment label for each sample in one_dim_data.
  • bins (int or iterable) – If an integer, bins - 1 bins created or a list of the delimiters.
  • colors (list, default None) – Colours for each group. Use equally distanced colours on colour map if not supplied.
  • figsize (tuple) – Figure size (width, height).
  • xlabel (str) – Plot xlabel.
  • ylabel (str) – Plot ylabel.
  • bbox_to_anchor (tuple) – Instruction to placing the legend box relative to the axes. Details refer to Matplotlib document.
  • loc (int) – The corner of the legend box to anchor. Details refer to Matplotlib document.
  • grid (boolean, default True) – Show grid.
  • show (boolean, default True) – Show figure in pop-up windows if true. Save to files if False.
  • filepath (str) – File name to saving the plot. Must be assigned a valid filepath if show is False.
  • **kwargs (keyword arguments) – Other keyword arguemnts passed on to matplotlib.pyplot.scatter.

Note

Instances in a same cluster does not necessarily assemble together in all one dimensional sub-spaces. There can be possibly no clustering capaility for certain features. Additionally certain features play a secondary role in clustering as having less importance in field_importance in clusteror module.

clusteror.plot.scatter_plot_two_dim_group_data(two_dim_data, labels, markers=None, colors=None, figsize=(10, 6), xlim=None, ylim=None, alpha=0.8, bbox_to_anchor=(1.01, 1), loc=2, grid=True, show=True, filepath=None, **kwargs)[source]

Plot the distribution of a two dimensional data against clustering groups in a scatter plot.

A point represents an instance in the dataset. Points in a same cluster are painted with a same colour.

This tool is useful to check the clustering impact in this two-dimensional sub-space.

Parameters:
  • two_dim_data (Pandas DataFrame) – A dataframe with two columns. The first column goes to the x-axis, and the second column goes to the y-axis.
  • labels (list, Pandas Series, Numpy Array, or any iterable) – The segment label for each sample in two_dim_data.
  • markers (list) – Marker names for each group.
  • bbox_to_anchor (tuple) – Instruction to placing the legend box relative to the axes. Details refer to Matplotlib document.
  • colors (list, default None) – Colours for each group. Use equally distanced colours on colour map if not supplied.
  • figsize (tuple) – Figure size (width, height).
  • xlim (tuple) – X-axis limits.
  • ylim (tuple) – Y-axis limits.
  • alpha (float, between 0 and 1) – Marker transparency. From 0 to 1: from transparent to opaque.
  • loc (int) – The corner of the legend box to anchor. Details refer to Matplotlib document.
  • grid (boolean, default True) – Show grid.
  • show (boolean, default True) – Show figure in pop-up windows if true. Save to files if False.
  • filepath (str) – File name to saving the plot. Must be assigned a valid filepath if show is False.
  • **kwargs (keyword arguments) – Other keyword arguemnts passed on to matplotlib.pyplot.scatter.

Note

Instances in a same cluster does not necessarily assemble together in all two dimensional sub-spaces. There can be possibly no clustering capaility for certain features. Additionally certain features play a secondary role in clustering as having less importance in field_importance in clusteror module.