Skip to content

Seaborn

Seaborn provides advanced graphical capabilities for creating sophisticated statistical visualizations with ease. It simplifies the process of generating complex plots from pandas DataFrames using simple commands. Let consider the CHD_test.csv,

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd 
CHD=pd.read_csv('./data/CHD_test.csv',index_col=False)
CHD.head()

Histogram

Standardize the 'median_income' and 'median_house_value' and plot the

import seaborn as sns
sns.set(color_codes=True)

CHD['median_income'] = (CHD['median_income'] -CHD['median_income'].mean()) / CHD['median_income'].std()
CHD['median_house_value'] = (CHD['median_house_value'] -CHD['median_house_value'].mean()) / CHD['median_house_value'].std()
for col in ['median_income','median_house_value']:
    plt.hist(CHD[col], density=True)
plt.show(block=False)

We can get a smooth estimate of the distribution using a kernel density estimation (KDE):

import warnings
warnings.filterwarnings("ignore")
sns.kdeplot(data=CHD, x='median_income', y='median_house_value')
plt.show(block=False)

You can create a hexagonally-based histogram using jointplot:

sns.jointplot(data=CHD, x='median_income', y='median_house_value',kind="hex")

sns.jointplot(data=CHD, x='median_income', y='median_house_value',kind="kde", hue='famlev')

The following illustrates how to draw a box plot for different family levels.

g=sns.catplot(data=CHD, x='median_income', y='famlev', kind="box")
g.set_axis_labels("Income", "Family level");

Pairplots

We can generalize joint plots for multidimensional data, which is very useful for exploring correlations between multiple dimensions of data.

sns.pairplot(CHD, hue='famlev');

Joyplot

Joyplot is a useful plot to compare distributions, the following show how to plot

sns.set_theme(style="white", rc={"axes.facecolor": (0, 0, 0, 0)})

Create the data

rs = np.random.RandomState(1979)
x = rs.randn(500)
g = np.tile(list("ABCDEFGHIJ"), 50)
df = pd.DataFrame(dict(x=x, g=g))
m = df.g.map(ord)
df["x"] += m

Initialize the FacetGrid object

pal = sns.cubehelix_palette(10, rot=-.25, light=.7)
g = sns.FacetGrid(df, row="g", hue="g", aspect=15, height=.5, palette=pal)

Draw the densities in a few steps

g.map(sns.kdeplot, "x",bw_adjust=.5, clip_on=False,fill=True, alpha=1, linewidth=1.5)
g.map(sns.kdeplot, "x", clip_on=False, color="w", lw=2, bw_adjust=.5)

Define and use a simple function to label the plot in axes coordinates

def label(x, color, label):
    ax = plt.gca()
    ax.text(0, .2, label, fontweight="bold", color=color,
            ha="left", va="center", transform=ax.transAxes)
g.map(label, "x")

Set the subplots to overlap

g.fig.subplots_adjust(hspace=-.25)

Remove axes details that don't play well with overlap

g.set_titles("")
g.set(yticks=[], ylabel="")
g.despine(bottom=True, left=True)
plt.show(block=False)

Displot

This function can be used for visualizing the univariate or bivariate distribution of data,

import seaborn as sns
import matplotlib.pyplot as plt
data1 = np.random.normal(size = 100)
data2 = np.random.normal(size = 100)

var={"A": data1, "B": data2}
df= pd.DataFrame(data=var,index=(range(100)))
sns.displot(data=df,kde=True)
plt.show(block=False)