The clusteror.plot
Module¶
Plotting tools relevant for illustrating and comparing clustering results can be found in this module.
-
clusteror.plot.
group_occurance_plot
(one_dim_data, cat_label, labels, group_label, colors=None, figsize=(10, 6), bbox_to_anchor=(1.01, 1), loc=2, grid=True, show=True, filepath=None, **kwargs)[source]¶ Plot the distribution of a one dimensional ordinal or categorical data in a bar chart. This tool is useful to check the clustering impact in this one-dimensional sub-space.
Parameters: - one_dim_data (list, Pandas Series, Numpy Array, or any iterable) – A sequence of data. Each element if for an instance.
- cat_label (str) – Field name will be used for the one dimensional data.
- labels (list, Pandas Series, Numpy Array, or any iterable) – The segment label for each sample in one_dim_data.
- group_label (str) – Field name will be used for the cluster ID.
- colors (list, default None) – Colours for each category existing in this one dimensional data. Default colour scheme used if not supplied.
- figsize (tuple) – Figure size (width, height).
- bbox_to_anchor (tuple) – Instruction to placing the legend box relative to the axes. Details
refer to
Matplotlib
document. - loc (int) – The corner of the legend box to anchor. Details refer to
Matplotlib
document. - grid (boolean, default True) – Show grid.
- show (boolean, default True) – Show figure in pop-up windows if true. Save to files if False.
- filepath (str) – File name to saving the plot. Must be assigned a valid filepath if
show
is False. - **kwargs (keyword arguments) – Other keyword arguemnts passed on to
matplotlib.pyplot.scatter
.
Note
Instances in a same cluster does not necessarily assemble together in all one dimensional sub-spaces. There can be possibly no clustering capaility for certain features. Additionally certain features play a secondary role in clustering as having less importance in
field_importance
inclusteror
module.
-
clusteror.plot.
hist_plot_one_dim_group_data
(one_dim_data, labels, bins=11, colors=None, figsize=(10, 6), xlabel='Dimension Reduced Data', ylabel='Occurance', bbox_to_anchor=(1.01, 1), loc=2, grid=True, show=True, filepath=None, **kwargs)[source]¶ Plot the distribution of a one dimensional numerical data in a histogram. This tool is useful to check the clustering impact in this one-dimensional sub-space.
Parameters: - one_dim_data (list, Pandas Series, Numpy Array, or any iterable) – A sequence of data. Each element if for an instance.
- labels (list, Pandas Series, Numpy Array, or any iterable) – The segment label for each sample in
one_dim_data
. - bins (int or iterable) – If an integer, bins - 1 bins created or a list of the delimiters.
- colors (list, default None) – Colours for each group. Use equally distanced colours on colour map if not supplied.
- figsize (tuple) – Figure size (width, height).
- xlabel (str) – Plot xlabel.
- ylabel (str) – Plot ylabel.
- bbox_to_anchor (tuple) – Instruction to placing the legend box relative to the axes. Details
refer to
Matplotlib
document. - loc (int) – The corner of the legend box to anchor. Details refer to
Matplotlib
document. - grid (boolean, default True) – Show grid.
- show (boolean, default True) – Show figure in pop-up windows if true. Save to files if False.
- filepath (str) – File name to saving the plot. Must be assigned a valid filepath if
show
is False. - **kwargs (keyword arguments) – Other keyword arguemnts passed on to
matplotlib.pyplot.scatter
.
Note
Instances in a same cluster does not necessarily assemble together in all one dimensional sub-spaces. There can be possibly no clustering capaility for certain features. Additionally certain features play a secondary role in clustering as having less importance in
field_importance
inclusteror
module.
-
clusteror.plot.
scatter_plot_two_dim_group_data
(two_dim_data, labels, markers=None, colors=None, figsize=(10, 6), xlim=None, ylim=None, alpha=0.8, bbox_to_anchor=(1.01, 1), loc=2, grid=True, show=True, filepath=None, **kwargs)[source]¶ Plot the distribution of a two dimensional data against clustering groups in a scatter plot.
A point represents an instance in the dataset. Points in a same cluster are painted with a same colour.
This tool is useful to check the clustering impact in this two-dimensional sub-space.
Parameters: - two_dim_data (Pandas DataFrame) – A dataframe with two columns. The first column goes to the x-axis, and the second column goes to the y-axis.
- labels (list, Pandas Series, Numpy Array, or any iterable) – The segment label for each sample in
two_dim_data
. - markers (list) – Marker names for each group.
- bbox_to_anchor (tuple) – Instruction to placing the legend box relative to the axes. Details
refer to
Matplotlib
document. - colors (list, default None) – Colours for each group. Use equally distanced colours on colour map if not supplied.
- figsize (tuple) – Figure size (width, height).
- xlim (tuple) – X-axis limits.
- ylim (tuple) – Y-axis limits.
- alpha (float, between 0 and 1) – Marker transparency. From 0 to 1: from transparent to opaque.
- loc (int) – The corner of the legend box to anchor. Details refer to
Matplotlib
document. - grid (boolean, default True) – Show grid.
- show (boolean, default True) – Show figure in pop-up windows if true. Save to files if False.
- filepath (str) – File name to saving the plot. Must be assigned a valid filepath if
show
is False. - **kwargs (keyword arguments) – Other keyword arguemnts passed on to
matplotlib.pyplot.scatter
.
Note
Instances in a same cluster does not necessarily assemble together in all two dimensional sub-spaces. There can be possibly no clustering capaility for certain features. Additionally certain features play a secondary role in clustering as having less importance in
field_importance
inclusteror
module.