If you're starting out in data science, there are three distribution charts you must know. These will handle 90% of your data exploration needs. Let's dive in!


1. Histogram — Your Distribution Foundation

What it does: Shows how frequently data falls into different ranges (bins). It's the first chart you should always make when exploring a new dataset.

When to use it: To quickly understand the shape, center, and spread of your data. Is it normal? Skewed? Bimodal?

import matplotlib.pyplot as plt
import numpy as np

# Generate sample data
data = np.random.normal(loc=50, scale=10, size=1000)

# Basic Histogram
plt.hist(data, bins=30, color='skyblue', edgecolor='black')
plt.title('Histogram of Data')
plt.xlabel('Value')
plt.ylabel('Frequency')
plt.show()

Pro Tip: Use bins='auto' to let Python choose the optimal bin count, or try bins='fd' (Freedman-Diaconis) for robust results.


2. KDE Plot (Density Plot) — The Smooth Story

What it does: Creates a smooth, continuous curve that estimates the probability density of your data. Think of it as a "smoothed-out histogram."

When to use it: When you want to compare multiple distributions on the same plot without the visual clutter of bars. KDE is excellent for spotting subtle patterns like multiple peaks.

import seaborn as sns
import matplotlib.pyplot as plt

# KDE Plot
sns.kdeplot(data, fill=True, color='purple')
plt.title('KDE (Density) Plot')
plt.xlabel('Value')
plt.ylabel('Density')
plt.show()

Pro Tip: Use fill=True to shade under the curve for better readability. When comparing groups, use hue='category' to overlay multiple distributions.


3. Box Plot — The Statistical Summary

What it does: Displays the five-number summary (min, Q1, median, Q3, max) and highlights outliers. It's the fastest way to spot weird data points.

When to use it: To compare distributions across categories or quickly identify outliers and skewness.

import seaborn as sns
import matplotlib.pyplot as plt

# Box Plot
sns.boxplot(data=data, color='lightgreen')
plt.title('Box Plot')
plt.show()

How to read it:

  • Box = middle 50% of data (IQR)
  • Line inside box = median
  • Whiskers = range of typical data (1.5× IQR)
  • Dots beyond whiskers = outliers

Pro Tip: Always check box plots alongside histograms — box plots hide modality (multiple peaks), while histograms reveal the full shape.


The Power Combo: All Three Together

For a complete picture, use Seaborn's histplot() with kde=True to overlay a histogram with a KDE curve, then add a box plot beside it:

import seaborn as sns
import matplotlib.pyplot as plt

fig, axes = plt.subplots(1, 2, figsize=(12, 5))

# Histogram + KDE
sns.histplot(data, kde=True, ax=axes[0], color='steelblue')
axes[0].set_title('Histogram + KDE')

# Box Plot
sns.boxplot(y=data, ax=axes[1], color='coral')
axes[1].set_title('Box Plot')

plt.tight_layout()
plt.show()

Quick Decision Guide

Chart Best For Key Insight
Histogram First look at data Shape, skewness, bins
KDE Plot Comparing groups Smooth trends, overlaps
Box Plot Outlier detection Quartiles, spread, anomalies

Final Takeaway

Master these three charts and you'll have a solid foundation for understanding any dataset. Start with a histogram to see the big picture, add a KDE for smooth comparisons, and use a box plot to catch outliers. That's your 3-minute distribution toolkit!