Chapter 08 — Visualization

Data Visualization

Create clear and insightful charts. Choose the right chart for the right question.

8.0 Visualization decision guide
GoalBest chartWhyAvoid / skip
Show one numeric distributionHistogram + KDEReveals skew, modes, tailsSkip pie chart for numeric distributions
Compare groups on numeric valueBoxplot/violin + stripShows median + spread + outliersSkip only bar mean without spread info
Compare category totalsBar chartBest for discrete comparisonsAvoid stacked bars with too many segments
Show trend over timeLine chartNatural for temporal continuitySkip if x-axis is unordered categories
Explore two numeric varsScatter/regplotShows pattern, clusters, outliersSkip with heavy overplotting without alpha/binning
Before publishing a chart: label units, start bars at zero, limit colors, and add one sentence interpretation. A chart without context is easy to misread.
8.1 Distribution charts
python
# Histogram with density curve
fig, axes = plt.subplots(1, 2, figsize=(12, 4))
sns.histplot(df['salary'], bins=30, kde=True, ax=axes[0])
axes[0].set_title('Salary Distribution')

# Boxplot — shows median, IQR, and outliers
sns.boxplot(x='department', y='salary', data=df, ax=axes[1])
axes[1].tick_params(axis='x', rotation=45)
plt.tight_layout(); plt.show()
8.2 Relationship charts
python
# Scatter plot with color grouping
sns.scatterplot(x='age', y='salary', hue='department', data=df, alpha=0.7)
plt.title('Age vs Salary by Department'); plt.show()

# Regression line
sns.regplot(x='age', y='salary', data=df, scatter_kws={'alpha':0.5})

# Pairplot — all numeric columns vs each other
cols = ['age', 'salary', 'experience', 'score']
sns.pairplot(df[cols + ['category']], hue='category', diag_kind='kde')
plt.show()
8.3 Categorical charts
python
# Horizontal bar chart (good for many categories)
df['department'].value_counts().sort_values().plot(
    kind='barh', figsize=(8, 5), color='#4a90d9'
)
plt.title('Employees per Department'); plt.tight_layout(); plt.show()

# Grouped bar chart
df.groupby(['year', 'product'])['sales'].sum().unstack().plot(
    kind='bar', figsize=(10, 5), edgecolor='white'
)
plt.xticks(rotation=0); plt.legend(title='Product'); plt.show()
8.4 Time series charts
python
# Basic line chart
df_ts = df.set_index('date').sort_index()
df_ts['sales'].plot(figsize=(12, 4), color='#2196F3', linewidth=1.5)
plt.title('Sales Over Time'); plt.ylabel('Sales'); plt.grid(alpha=0.3)
plt.tight_layout(); plt.show()

# With rolling average
df_ts['rolling_7d'] = df_ts['sales'].rolling(window=7).mean()
df_ts[['sales', 'rolling_7d']].plot(figsize=(12, 4))
8.5 Interactive charts with Plotly
python
# Interactive scatter
fig = px.scatter(df, x='age', y='salary', color='department',
                 size='experience', hover_name='name',
                 title='Salary by Age and Department')
fig.show()

# Interactive bar
fig = px.bar(df, x='month', y='revenue', color='product',
             barmode='group', title='Monthly Revenue by Product')
fig.show()

# Interactive line (time series)
fig = px.line(df, x='date', y='sales', color='region')
fig.show()
Chart typeUse when...
HistogramShow distribution of one numeric column
BoxplotCompare distributions across groups, spot outliers
Bar chartCompare counts or totals across categories
Scatter plotShow relationship between two numeric columns
Line chartShow change over time
HeatmapShow correlations or 2D frequency tables
Pie chartShow proportions (max 5-6 slices)
Common mistakes to avoid
Quick cheatsheet
df.info() -> Structure and non-null counts
df.describe() -> Numeric summary statistics
df.isnull().sum() -> Missing-value counts by column
df.groupby() -> Segmented aggregation
pd.merge() -> Join multiple datasets