🎨 Welcome to Week 5: Data Visualization
"A picture is worth a thousand data points." This week, you'll learn to transform raw numbers into compelling visual stories that reveal insights instantly.
- Matplotlib Fundamentals — Create line, bar, and scatter plots with Python's core plotting library
- Seaborn Power — Generate beautiful statistical visualizations with minimal code
- Plot Customization — Add titles, labels, legends, colors, and styling for professional charts
- Multi-plot Layouts — Use subplots to create dashboards and comparative visualizations
- Design Principles — Apply data-ink ratio, color theory, and visualization best practices
📚 Key Terms to Know
🗺️ Today's Topics
Click any card below to jump directly to that section:
Matplotlib = Raw Canvas and Paint: Total control, but you have to paint every detail yourself.
Seaborn = Paint-by-Numbers Kit: Beautiful defaults, pre-designed templates, but less fine-grained control.
In Practice? Use Seaborn for quick exploratory plots. Use Matplotlib when you need pixel-perfect customization. Often, you'll use both together!
Data visualization is how you communicate findings to non-technical stakeholders:
• Business Reports — Executives prefer charts over spreadsheets
• Exploratory Analysis — Spot outliers, trends, and patterns instantly
• Machine Learning — Visualize model performance, feature importance, and errors
• Scientific Research — Publish findings in papers with clear, professional plots
• Dashboards — Build interactive monitoring tools for live data
Each topic shows the same plot created 4 different ways:
🔴 Bad Practice — Code that produces ugly, unclear, or misleading charts
🟠 Novice Solution — Basic working plots that beginners create
🔵 Intermediate Solution — Better styling and customization
🟢 Best Practice — Publication-ready, professional visualizations
Remember: A chart's job is to communicate clearly. Beauty without clarity is decoration, not visualization!
1. Iterate visually: Run code, see the plot, adjust, repeat. Plotting is inherently experimental.
2. Start with Seaborn: For exploration, Seaborn is faster. Switch to Matplotlib for final polish.
3. Save examples: Build a personal gallery of plot templates you can reuse.
4. Think about your audience: Who will see this? What do they need to understand?
5. Less is more: Remove gridlines, excessive labels, and decoration. Focus on the data.
📉 Matplotlib Basics
Matplotlib is the grandfather of Python plotting. It mimics MATLAB's plotting interface.
# ❌ BAD: Relying entirely on global state
import matplotlib.pyplot as plt
# Implicitly uses "current" figure
plt.plot([1, 2, 3], [4, 5, 6])
plt.title('My Plot')
plt.show()
# Problem: Hard to manage multiple plots or complex layouts
# 🔰 NOVICE: Simple line plot
import matplotlib.pyplot as plt
x = [1, 2, 3, 4]
y = [10, 20, 25, 30]
plt.figure(figsize=(8, 6))
plt.plot(x, y)
plt.xlabel('Time')
plt.ylabel('Value')
plt.show()
# ⭐ BEST PRACTICE: Object-Oriented Interface
import matplotlib.pyplot as plt
x = [1, 2, 3, 4]
y = [10, 20, 25, 30]
# Create Figure (canvas) and Axes (plot area) explicitly
fig, ax = plt.subplots(figsize=(8, 6))
# Plot on the specific axes object
ax.plot(x, y, marker='o', linestyle='--', color='#00f0ff')
# Set attributes on the axes
ax.set_title('Growth Over Time', fontsize=14)
ax.set_xlabel('Time (s)')
ax.set_ylabel('Velocity (m/s)')
ax.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Always use fig, ax = plt.subplots(). It gives you full control over every element and is essential for complex dashboards.
Description: Create a line chart that visualizes stock price movements over time using the object-oriented Matplotlib interface.
- Generate sample data: 30 days of stock prices starting at $100 with random daily changes (±5%)
- Use
fig, ax = plt.subplots(figsize=(12, 6))to create your canvas - Plot the price line with markers at each data point
- Add a horizontal line at the starting price ($100) as a reference
- Color the line green if the final price is above $100, red if below
- Include proper axis labels: "Day" (x-axis) and "Price ($)" (y-axis)
- Add a title: "30-Day Stock Price Movement"
- Apply a grid with
alpha=0.3for readability
Bonus: Add annotations for the highest and lowest price points using ax.annotate().
Description: Extend your Stock Price Tracker to compare 3 different stocks on the same chart.
- Generate price data for three fictional stocks: "TechCorp", "DataInc", "AILabs"
- Plot all three lines on the same axes with different colors
- Add a legend to identify each stock
- Calculate and display the percentage change for each stock in the legend label
- Use different line styles (solid, dashed, dotted) for accessibility
- Add a shaded region using
ax.fill_between()to highlight a "volatility period" (days 10-20)
Pro Tip: Use ax.legend(loc='upper left') to position the legend away from the data.
- Using Default Styling: Default matplotlib plots look outdated. Always apply a style (
plt.style.use('seaborn-v0_8')) or use Seaborn for modern aesthetics. - Forgetting plt.close(): In loops or scripts creating many plots, forgetting
plt.close()causes memory leaks. Close figures you don't need anymore. - Not Setting Figure Size: Tiny default figures make text unreadable. Always set
figsizeinplt.subplots()to control dimensions (e.g.,figsize=(10, 6)). - Ignoring Axis Labels: Plots without
xlabel,ylabel, ortitleare hard to understand. Always label your axes and provide context.
- When would you choose matplotlib's pyplot interface over the object-oriented interface? What are the trade-offs?
- How does choosing the right figure size affect the readability of your visualizations in presentations vs. reports?
- What makes a data visualization "good"? Discuss the balance between aesthetic appeal and data integrity.
- Why is it important to close figures in production code? What real-world problems could arise from memory leaks in visualization scripts?
Open notebook-sessions/week5/session1_data_visualization.ipynb and reproduce the OO plotting pattern (fig, ax = plt.subplots()). Add labels, titles, and a style. Bonus: compare pyplot vs OO outputs.
🌊 Seaborn Power
Seaborn makes complex statistical plots simple. It integrates tightly with Pandas DataFrames.
# 🔰 NOVICE: Basic Scatter Plot
import seaborn as sns
import matplotlib.pyplot as plt
# Load built-in dataset
tips = sns.load_dataset("tips")
sns.scatterplot(data=tips, x="total_bill", y="tip")
plt.show()
# ⭐ BEST PRACTICE: Adding dimensions
import seaborn as sns
import matplotlib.pyplot as plt
# Set global theme
sns.set_theme(style="darkgrid")
tips = sns.load_dataset("tips")
# Encode more info: Color by 'time', Size by 'size'
plt.figure(figsize=(10, 6))
sns.scatterplot(
data=tips,
x="total_bill",
y="tip",
hue="time", # Color
size="size", # Dot size
palette="viridis"
)
plt.title("Tips vs Total Bill (by Time of Day)")
plt.show()
Seaborn handles mapping data columns to visual attributes (hue, size, style) automatically. Doing this in pure Matplotlib requires complex loops.
Description: Create bar charts to compare product sales across different categories and regions using Seaborn.
- Create a DataFrame with columns: 'Product' (A, B, C, D), 'Region' (North, South, East, West), 'Sales' (random values)
- Use
sns.barplot()to show average sales by Product - Add
hue='Region'to compare sales across regions within each product - Apply the 'viridis' or 'coolwarm' color palette
- Add error bars to show confidence intervals (Seaborn does this automatically!)
- Rotate x-axis labels 45 degrees for readability using
plt.xticks(rotation=45) - Add a descriptive title and axis labels
Bonus: Create a second plot using sns.catplot(kind='bar') with col='Region' to create a faceted bar chart.
Description: Use scatter plots to visualize and analyze relationships between variables in a dataset.
- Load the built-in 'mpg' dataset:
mpg = sns.load_dataset('mpg') - Create a scatter plot of 'horsepower' vs 'mpg' (miles per gallon)
- Add
hue='origin'to color points by country of origin (USA, Europe, Japan) - Use
size='weight'to encode vehicle weight in point sizes - Add a regression line using
sns.regplot()or setfit_reg=True - Interpret the correlation: Is it positive or negative? Strong or weak?
- Create a correlation heatmap for all numeric columns using
sns.heatmap(mpg.corr(), annot=True)
Questions to Answer: Which variable has the strongest correlation with MPG? How does the relationship vary by origin?
Description: Combine multiple Seaborn plots to tell a complete story about the 'tips' dataset.
- Load the tips dataset:
tips = sns.load_dataset('tips') - Create a
pairplot()to see relationships between all numeric variables, colored by 'time' - Create a
violinplot()showing tip distribution by day and time - Use
jointplot()to show the relationship between total_bill and tip with marginal histograms - Create a
boxplot()comparing tips by day of week - Write 3-4 bullet points describing insights you discovered from these visualizations
Reflection: Which plot type was most effective for each insight? Why?
- Forgetting the data= Parameter: Seaborn requires
data=dfexplicitly. Forgetting this parameter causes errors. Always pass your DataFrame. - Not Understanding hue Semantics: The
hueparameter creates categorical color encoding. Using it with continuous variables can produce unexpected results—consider usingpaletteor binning first. - Mixing Matplotlib and Seaborn Styling: Applying matplotlib styles after Seaborn themes can override Seaborn's aesthetics. Set Seaborn themes early with
sns.set_theme(). - Overusing Complex Plots: Plots like
pairplot()andjointplot()are powerful but slow on large datasets. Sample your data first or use simpler plots for quick exploration.
- When would you choose a Seaborn statistical plot (like
regplot) over a basic matplotlib scatter plot? - How does Seaborn's automatic statistical aggregation (in plots like
barplot) help or hurt data exploration? - What are the advantages of using
hue,size, andstyleparameters to encode multiple dimensions in a single plot? - When does a visualization become "too busy"? How do you decide what information to include in a single plot vs. multiple plots?
Open notebook-sessions/week5/session1_data_visualization.ipynb and create a Seaborn scatter with hue and size. Try regplot for a regression line and experiment with sns.set_theme().
🖼️ Subplots & Layouts
Often you need to show multiple charts side-by-side for comparison.
import matplotlib.pyplot as plt
import numpy as np
x = np.linspace(0, 10, 100)
# Create a 2x2 grid of plots
fig, axes = plt.subplots(2, 2, figsize=(10, 8))
# Top Left
axes[0, 0].plot(x, np.sin(x), 'r')
axes[0, 0].set_title('Sine Wave')
# Top Right
axes[0, 1].plot(x, np.cos(x), 'g')
axes[0, 1].set_title('Cosine Wave')
# Bottom Left
axes[1, 0].bar(['A', 'B', 'C'], [10, 20, 15])
axes[1, 0].set_title('Categories')
# Bottom Right
axes[1, 1].hist(np.random.randn(1000), bins=30, color='purple')
axes[1, 1].set_title('Histogram')
plt.tight_layout() # Prevents overlap
plt.show()
- Confusing axes Indexing: With
plt.subplots(2, 2),axesis a 2D array indexed asaxes[row, col]. For a single row or column, it's 1D. Checkaxes.shapeto avoid IndexError. - Forgetting tight_layout(): Without
plt.tight_layout(), subplot titles and labels often overlap. Always call this before showing or saving multi-subplot figures. - Sharing Axes Incorrectly: Using
sharex=Trueorsharey=Truecan hide axis labels on some subplots. Understand when sharing axes helps (e.g., time series comparisons) and when it hinders clarity. - Inconsistent Subplot Sizes: Not using
figsizeorgridspecproperly can make subplots tiny or distorted. Always set appropriate figure dimensions for the number of subplots.
- When should you use subplots vs. separate figures? What are the trade-offs for readability and comparison?
- How does
sharexorshareyimprove (or complicate) multi-plot comparisons? Give an example of when this is useful. - What layout strategies (grid, horizontal, vertical) work best for different types of data stories (e.g., time series, category comparisons)?
- How can you use subplots to create "small multiples" (Edward Tufte's concept)? Why is this powerful for showing patterns across categories?
Challenge: Create a 2x2 dashboard showing:
- Top-left: Line chart of daily temperatures
- Top-right: Bar chart of monthly rainfall
- Bottom-left: Scatter plot of temperature vs. rainfall
- Bottom-right: Histogram of temperature distribution
Bonus: Use fig.suptitle() to add an overall title to your dashboard.
Description: Build a complete weather analysis dashboard using subplots with realistic data.
- Generate 365 days of weather data: temperature (seasonal pattern + noise), rainfall (random), humidity
- Create a 2x2 subplot grid with
figsize=(14, 10) - Top-left: Line plot of daily temperature with a 7-day rolling average overlay
- Top-right: Monthly total rainfall as a bar chart (aggregate your daily data)
- Bottom-left: Scatter plot of temperature vs. humidity with seasonal color coding
- Bottom-right: Histogram of temperature distribution with mean/median lines
- Use
fig.suptitle('Annual Weather Analysis Dashboard', fontsize=16) - Apply
plt.tight_layout(rect=[0, 0, 1, 0.96])to make room for the title
Tip: Use np.random.seed(42) at the start for reproducible random data.
Description: Create a publication-ready multi-chart dashboard combining Matplotlib and Seaborn techniques.
- Load a real dataset (use
sns.load_dataset('diamonds')or the Titanic dataset) - Create a 2x3 grid of subplots (
fig, axes = plt.subplots(2, 3, figsize=(18, 10))) - Plot 1: Distribution histogram of the main numeric variable
- Plot 2: Box plot comparing categories
- Plot 3: Scatter plot with regression line showing two variable relationships
- Plot 4: Bar chart of counts by category
- Plot 5: Correlation heatmap of numeric columns
- Plot 6: Violin or swarm plot for distribution comparison
- Apply consistent styling: use
sns.set_theme(style='whitegrid') - Add a main title using
fig.suptitle()with your analysis theme - Export the dashboard as a high-resolution PNG:
plt.savefig('dashboard.png', dpi=300, bbox_inches='tight')
Portfolio Piece: This dashboard can serve as a portfolio project—add insights as text annotations!
Open notebook-sessions/week5/session2_data_visualization_group.ipynb and build a 2x2 dashboard with shared axes where appropriate. Use tight_layout() and fig.suptitle().
📜 Visualization Cheat Sheet
# ═══ SETUP ═══
import matplotlib.pyplot as plt
import seaborn as sns
# ═══ FIGURE & AXES ═══
fig, ax = plt.subplots(figsize=(10, 6))
fig, axes = plt.subplots(2, 2) # 2x2 Grid
# ═══ BASIC PLOTS (Matplotlib) ═══
ax.plot(x, y) # Line
ax.bar(cats, vals) # Bar
ax.scatter(x, y) # Scatter
ax.hist(data, bins=20) # Histogram
# ═══ STATISTICAL PLOTS (Seaborn) ═══
sns.scatterplot(data=df, x='col1', y='col2', hue='cat')
sns.barplot(data=df, x='cat', y='num')
sns.boxplot(data=df, x='cat', y='num')
sns.heatmap(df.corr(), annot=True)
# ═══ CUSTOMIZATION ═══
ax.set_title('Title')
ax.set_xlabel('X Label')
plt.xticks(rotation=45)
plt.legend()
plt.tight_layout()
- Choose the Right Tool: Use matplotlib for full control and customization, Seaborn for quick statistical plots with beautiful defaults.
- Always Use OO Interface:
fig, ax = plt.subplots()gives you explicit control over every element—essential for complex visualizations. - Label Everything: Titles, axis labels, and legends turn a plot into a communication tool. Never skip them.
- Leverage Seaborn's Power: Use
hue,size, andstyleto encode multiple dimensions in a single plot efficiently. - Master Subplots: Multi-panel figures (
plt.subplots()) enable powerful comparisons and storytelling with data. - Iterate Visually: Visualization is exploratory—create rough plots quickly, then refine based on what the data reveals.
- Mind the Details: Figure size, color palettes, and
tight_layout()make the difference between amateur and professional visualizations. - Less is More: Remove unnecessary chart junk (excessive gridlines, 3D effects, too many colors) to focus attention on the data itself.