🎨 Welcome to Week 5: Data Visualization

"A picture is worth a thousand data points." This week, you'll learn to transform raw numbers into compelling visual stories that reveal insights instantly.

🎯 What You'll Learn Today
  • Matplotlib Fundamentals — Create line, bar, and scatter plots with Python's core plotting library
  • Seaborn Power — Generate beautiful statistical visualizations with minimal code
  • Plot Customization — Add titles, labels, legends, colors, and styling for professional charts
  • Multi-plot Layouts — Use subplots to create dashboards and comparative visualizations
  • Design Principles — Apply data-ink ratio, color theory, and visualization best practices

📚 Key Terms to Know

Figure
The overall container for your plot(s). Like the canvas or page.
Axes
The actual plot area with x and y axes. One Figure can contain multiple Axes (subplots).
Artist
Every element you see (lines, text, ticks) is an Artist object in Matplotlib.
Data-Ink Ratio
Ratio of ink used for data vs decoration. Higher is better—remove chartjunk!
Color Palette
The set of colors used in a chart. Should be accessible (colorblind-friendly) and meaningful.
Seaborn
A high-level plotting library built on Matplotlib with beautiful defaults and statistical functions.
📓

🚀 Interactive Companion Notebook

Data Wrangling & Aggregation — A comprehensive 2-hour interactive notebook covering selection, groupby, merging, and transformation with progressive examples (Novice → Intermediate → Advanced).

✅ Rich visualizations with Matplotlib & Seaborn
✅ Real-world sales analysis practice
✅ Assignment preview for Kaggle football dataset

🗺️ Today's Topics

Click any card below to jump directly to that section:

📉
Matplotlib Basics
Master Python's foundational plotting library with pyplot and object-oriented APIs.
📊
Seaborn Statistical Plots
Create beautiful statistical visualizations with minimal code and smart defaults.
🎭
Subplots & Layouts
Build multi-plot dashboards and comparative visualizations side-by-side.
💡 Matplotlib vs Seaborn: An Analogy

Matplotlib = Raw Canvas and Paint: Total control, but you have to paint every detail yourself.

Seaborn = Paint-by-Numbers Kit: Beautiful defaults, pre-designed templates, but less fine-grained control.

In Practice? Use Seaborn for quick exploratory plots. Use Matplotlib when you need pixel-perfect customization. Often, you'll use both together!

🌍 Why Visualization Matters in the Real World

Data visualization is how you communicate findings to non-technical stakeholders:

Business Reports — Executives prefer charts over spreadsheets

Exploratory Analysis — Spot outliers, trends, and patterns instantly

Machine Learning — Visualize model performance, feature importance, and errors

Scientific Research — Publish findings in papers with clear, professional plots

Dashboards — Build interactive monitoring tools for live data

📖 How to Use This Lesson

Each topic shows the same plot created 4 different ways:

🔴 Bad Practice — Code that produces ugly, unclear, or misleading charts

🟠 Novice Solution — Basic working plots that beginners create

🔵 Intermediate Solution — Better styling and customization

🟢 Best Practice — Publication-ready, professional visualizations

Remember: A chart's job is to communicate clearly. Beauty without clarity is decoration, not visualization!

📚 How to Approach This Lesson

1. Iterate visually: Run code, see the plot, adjust, repeat. Plotting is inherently experimental.

2. Start with Seaborn: For exploration, Seaborn is faster. Switch to Matplotlib for final polish.

3. Save examples: Build a personal gallery of plot templates you can reuse.

4. Think about your audience: Who will see this? What do they need to understand?

5. Less is more: Remove gridlines, excessive labels, and decoration. Focus on the data.

📉 Matplotlib Basics

Matplotlib is the grandfather of Python plotting. It mimics MATLAB's plotting interface.

Python ❌ Bad Practice Implicit Pyplot
# ❌ BAD: Relying entirely on global state
import matplotlib.pyplot as plt

# Implicitly uses "current" figure
plt.plot([1, 2, 3], [4, 5, 6])
plt.title('My Plot')
plt.show()

# Problem: Hard to manage multiple plots or complex layouts
Python 🔰 Novice Basic Plotting
# 🔰 NOVICE: Simple line plot
import matplotlib.pyplot as plt

x = [1, 2, 3, 4]
y = [10, 20, 25, 30]

plt.figure(figsize=(8, 6))
plt.plot(x, y)
plt.xlabel('Time')
plt.ylabel('Value')
plt.show()
Python ⭐ Best Practice Object-Oriented API
# ⭐ BEST PRACTICE: Object-Oriented Interface
import matplotlib.pyplot as plt

x = [1, 2, 3, 4]
y = [10, 20, 25, 30]

# Create Figure (canvas) and Axes (plot area) explicitly
fig, ax = plt.subplots(figsize=(8, 6))

# Plot on the specific axes object
ax.plot(x, y, marker='o', linestyle='--', color='#00f0ff')

# Set attributes on the axes
ax.set_title('Growth Over Time', fontsize=14)
ax.set_xlabel('Time (s)')
ax.set_ylabel('Velocity (m/s)')
ax.grid(True, alpha=0.3)

plt.tight_layout()
plt.show()
⭐ Pro Tip: OO Interface

Always use fig, ax = plt.subplots(). It gives you full control over every element and is essential for complex dashboards.

🏋️ Exercise: Stock Price Tracker

Description: Create a line chart that visualizes stock price movements over time using the object-oriented Matplotlib interface.

  • Generate sample data: 30 days of stock prices starting at $100 with random daily changes (±5%)
  • Use fig, ax = plt.subplots(figsize=(12, 6)) to create your canvas
  • Plot the price line with markers at each data point
  • Add a horizontal line at the starting price ($100) as a reference
  • Color the line green if the final price is above $100, red if below
  • Include proper axis labels: "Day" (x-axis) and "Price ($)" (y-axis)
  • Add a title: "30-Day Stock Price Movement"
  • Apply a grid with alpha=0.3 for readability

Bonus: Add annotations for the highest and lowest price points using ax.annotate().

🚀 Challenge: Multi-Stock Comparison

Description: Extend your Stock Price Tracker to compare 3 different stocks on the same chart.

  • Generate price data for three fictional stocks: "TechCorp", "DataInc", "AILabs"
  • Plot all three lines on the same axes with different colors
  • Add a legend to identify each stock
  • Calculate and display the percentage change for each stock in the legend label
  • Use different line styles (solid, dashed, dotted) for accessibility
  • Add a shaded region using ax.fill_between() to highlight a "volatility period" (days 10-20)

Pro Tip: Use ax.legend(loc='upper left') to position the legend away from the data.

⚠️ Common Pitfalls: Matplotlib
  • Using Default Styling: Default matplotlib plots look outdated. Always apply a style (plt.style.use('seaborn-v0_8')) or use Seaborn for modern aesthetics.
  • Forgetting plt.close(): In loops or scripts creating many plots, forgetting plt.close() causes memory leaks. Close figures you don't need anymore.
  • Not Setting Figure Size: Tiny default figures make text unreadable. Always set figsize in plt.subplots() to control dimensions (e.g., figsize=(10, 6)).
  • Ignoring Axis Labels: Plots without xlabel, ylabel, or title are hard to understand. Always label your axes and provide context.
💬 Discussion Questions
  1. When would you choose matplotlib's pyplot interface over the object-oriented interface? What are the trade-offs?
  2. How does choosing the right figure size affect the readability of your visualizations in presentations vs. reports?
  3. What makes a data visualization "good"? Discuss the balance between aesthetic appeal and data integrity.
  4. Why is it important to close figures in production code? What real-world problems could arise from memory leaks in visualization scripts?
📓 Practice in Notebook

Open notebook-sessions/week5/session1_data_visualization.ipynb and reproduce the OO plotting pattern (fig, ax = plt.subplots()). Add labels, titles, and a style. Bonus: compare pyplot vs OO outputs.

🌊 Seaborn Power

Seaborn makes complex statistical plots simple. It integrates tightly with Pandas DataFrames.

Python 🔰 Novice Basic Seaborn
# 🔰 NOVICE: Basic Scatter Plot
import seaborn as sns
import matplotlib.pyplot as plt

# Load built-in dataset
tips = sns.load_dataset("tips")

sns.scatterplot(data=tips, x="total_bill", y="tip")
plt.show()
Python ⭐ Best Practice Statistical & Styled
# ⭐ BEST PRACTICE: Adding dimensions
import seaborn as sns
import matplotlib.pyplot as plt

# Set global theme
sns.set_theme(style="darkgrid")

tips = sns.load_dataset("tips")

# Encode more info: Color by 'time', Size by 'size'
plt.figure(figsize=(10, 6))
sns.scatterplot(
    data=tips,
    x="total_bill",
    y="tip",
    hue="time",   # Color
    size="size",  # Dot size
    palette="viridis"
)

plt.title("Tips vs Total Bill (by Time of Day)")
plt.show()
⭐ Why Seaborn?

Seaborn handles mapping data columns to visual attributes (hue, size, style) automatically. Doing this in pure Matplotlib requires complex loops.

🏋️ Exercise: Sales Comparison

Description: Create bar charts to compare product sales across different categories and regions using Seaborn.

  • Create a DataFrame with columns: 'Product' (A, B, C, D), 'Region' (North, South, East, West), 'Sales' (random values)
  • Use sns.barplot() to show average sales by Product
  • Add hue='Region' to compare sales across regions within each product
  • Apply the 'viridis' or 'coolwarm' color palette
  • Add error bars to show confidence intervals (Seaborn does this automatically!)
  • Rotate x-axis labels 45 degrees for readability using plt.xticks(rotation=45)
  • Add a descriptive title and axis labels

Bonus: Create a second plot using sns.catplot(kind='bar') with col='Region' to create a faceted bar chart.

🏋️ Exercise: Correlation Analysis

Description: Use scatter plots to visualize and analyze relationships between variables in a dataset.

  • Load the built-in 'mpg' dataset: mpg = sns.load_dataset('mpg')
  • Create a scatter plot of 'horsepower' vs 'mpg' (miles per gallon)
  • Add hue='origin' to color points by country of origin (USA, Europe, Japan)
  • Use size='weight' to encode vehicle weight in point sizes
  • Add a regression line using sns.regplot() or set fit_reg=True
  • Interpret the correlation: Is it positive or negative? Strong or weak?
  • Create a correlation heatmap for all numeric columns using sns.heatmap(mpg.corr(), annot=True)

Questions to Answer: Which variable has the strongest correlation with MPG? How does the relationship vary by origin?

🚀 Challenge: Tell a Statistical Story

Description: Combine multiple Seaborn plots to tell a complete story about the 'tips' dataset.

  • Load the tips dataset: tips = sns.load_dataset('tips')
  • Create a pairplot() to see relationships between all numeric variables, colored by 'time'
  • Create a violinplot() showing tip distribution by day and time
  • Use jointplot() to show the relationship between total_bill and tip with marginal histograms
  • Create a boxplot() comparing tips by day of week
  • Write 3-4 bullet points describing insights you discovered from these visualizations

Reflection: Which plot type was most effective for each insight? Why?

⚠️ Common Pitfalls: Seaborn
  • Forgetting the data= Parameter: Seaborn requires data=df explicitly. Forgetting this parameter causes errors. Always pass your DataFrame.
  • Not Understanding hue Semantics: The hue parameter creates categorical color encoding. Using it with continuous variables can produce unexpected results—consider using palette or binning first.
  • Mixing Matplotlib and Seaborn Styling: Applying matplotlib styles after Seaborn themes can override Seaborn's aesthetics. Set Seaborn themes early with sns.set_theme().
  • Overusing Complex Plots: Plots like pairplot() and jointplot() are powerful but slow on large datasets. Sample your data first or use simpler plots for quick exploration.
💬 Discussion Questions
  1. When would you choose a Seaborn statistical plot (like regplot) over a basic matplotlib scatter plot?
  2. How does Seaborn's automatic statistical aggregation (in plots like barplot) help or hurt data exploration?
  3. What are the advantages of using hue, size, and style parameters to encode multiple dimensions in a single plot?
  4. When does a visualization become "too busy"? How do you decide what information to include in a single plot vs. multiple plots?
📓 Practice in Notebook

Open notebook-sessions/week5/session1_data_visualization.ipynb and create a Seaborn scatter with hue and size. Try regplot for a regression line and experiment with sns.set_theme().

🖼️ Subplots & Layouts

Often you need to show multiple charts side-by-side for comparison.

Python ⭐ Best Practice 2x2 Grid
import matplotlib.pyplot as plt
import numpy as np

x = np.linspace(0, 10, 100)

# Create a 2x2 grid of plots
fig, axes = plt.subplots(2, 2, figsize=(10, 8))

# Top Left
axes[0, 0].plot(x, np.sin(x), 'r')
axes[0, 0].set_title('Sine Wave')

# Top Right
axes[0, 1].plot(x, np.cos(x), 'g')
axes[0, 1].set_title('Cosine Wave')

# Bottom Left
axes[1, 0].bar(['A', 'B', 'C'], [10, 20, 15])
axes[1, 0].set_title('Categories')

# Bottom Right
axes[1, 1].hist(np.random.randn(1000), bins=30, color='purple')
axes[1, 1].set_title('Histogram')

plt.tight_layout()  # Prevents overlap
plt.show()
⚠️ Common Pitfalls: Subplots
  • Confusing axes Indexing: With plt.subplots(2, 2), axes is a 2D array indexed as axes[row, col]. For a single row or column, it's 1D. Check axes.shape to avoid IndexError.
  • Forgetting tight_layout(): Without plt.tight_layout(), subplot titles and labels often overlap. Always call this before showing or saving multi-subplot figures.
  • Sharing Axes Incorrectly: Using sharex=True or sharey=True can hide axis labels on some subplots. Understand when sharing axes helps (e.g., time series comparisons) and when it hinders clarity.
  • Inconsistent Subplot Sizes: Not using figsize or gridspec properly can make subplots tiny or distorted. Always set appropriate figure dimensions for the number of subplots.
💬 Discussion Questions
  1. When should you use subplots vs. separate figures? What are the trade-offs for readability and comparison?
  2. How does sharex or sharey improve (or complicate) multi-plot comparisons? Give an example of when this is useful.
  3. What layout strategies (grid, horizontal, vertical) work best for different types of data stories (e.g., time series, category comparisons)?
  4. How can you use subplots to create "small multiples" (Edward Tufte's concept)? Why is this powerful for showing patterns across categories?
🎮 Practice Playground: Dashboard Creation

Challenge: Create a 2x2 dashboard showing:

  • Top-left: Line chart of daily temperatures
  • Top-right: Bar chart of monthly rainfall
  • Bottom-left: Scatter plot of temperature vs. rainfall
  • Bottom-right: Histogram of temperature distribution

Bonus: Use fig.suptitle() to add an overall title to your dashboard.

🏋️ Exercise: Weather Analysis Dashboard

Description: Build a complete weather analysis dashboard using subplots with realistic data.

  • Generate 365 days of weather data: temperature (seasonal pattern + noise), rainfall (random), humidity
  • Create a 2x2 subplot grid with figsize=(14, 10)
  • Top-left: Line plot of daily temperature with a 7-day rolling average overlay
  • Top-right: Monthly total rainfall as a bar chart (aggregate your daily data)
  • Bottom-left: Scatter plot of temperature vs. humidity with seasonal color coding
  • Bottom-right: Histogram of temperature distribution with mean/median lines
  • Use fig.suptitle('Annual Weather Analysis Dashboard', fontsize=16)
  • Apply plt.tight_layout(rect=[0, 0, 1, 0.96]) to make room for the title

Tip: Use np.random.seed(42) at the start for reproducible random data.

🚀 Challenge: Professional Analytics Dashboard

Description: Create a publication-ready multi-chart dashboard combining Matplotlib and Seaborn techniques.

  • Load a real dataset (use sns.load_dataset('diamonds') or the Titanic dataset)
  • Create a 2x3 grid of subplots (fig, axes = plt.subplots(2, 3, figsize=(18, 10)))
  • Plot 1: Distribution histogram of the main numeric variable
  • Plot 2: Box plot comparing categories
  • Plot 3: Scatter plot with regression line showing two variable relationships
  • Plot 4: Bar chart of counts by category
  • Plot 5: Correlation heatmap of numeric columns
  • Plot 6: Violin or swarm plot for distribution comparison
  • Apply consistent styling: use sns.set_theme(style='whitegrid')
  • Add a main title using fig.suptitle() with your analysis theme
  • Export the dashboard as a high-resolution PNG: plt.savefig('dashboard.png', dpi=300, bbox_inches='tight')

Portfolio Piece: This dashboard can serve as a portfolio project—add insights as text annotations!

📓 Practice in Notebook

Open notebook-sessions/week5/session2_data_visualization_group.ipynb and build a 2x2 dashboard with shared axes where appropriate. Use tight_layout() and fig.suptitle().

📜 Visualization Cheat Sheet

Plotting Patterns Reference
# ═══ SETUP ═══
import matplotlib.pyplot as plt
import seaborn as sns

# ═══ FIGURE & AXES ═══
fig, ax = plt.subplots(figsize=(10, 6))
fig, axes = plt.subplots(2, 2)  # 2x2 Grid

# ═══ BASIC PLOTS (Matplotlib) ═══
ax.plot(x, y)           # Line
ax.bar(cats, vals)      # Bar
ax.scatter(x, y)        # Scatter
ax.hist(data, bins=20)  # Histogram

# ═══ STATISTICAL PLOTS (Seaborn) ═══
sns.scatterplot(data=df, x='col1', y='col2', hue='cat')
sns.barplot(data=df, x='cat', y='num')
sns.boxplot(data=df, x='cat', y='num')
sns.heatmap(df.corr(), annot=True)

# ═══ CUSTOMIZATION ═══
ax.set_title('Title')
ax.set_xlabel('X Label')
plt.xticks(rotation=45)
plt.legend()
plt.tight_layout()
🧠 Quick Quiz: Week 5 Concepts

1) Which interface provides explicit control over plots?

2) In Seaborn, which parameter colors points by category?

3) Which function prevents subplot overlap?

🎯 Key Takeaways: Data Visualization
  • Choose the Right Tool: Use matplotlib for full control and customization, Seaborn for quick statistical plots with beautiful defaults.
  • Always Use OO Interface: fig, ax = plt.subplots() gives you explicit control over every element—essential for complex visualizations.
  • Label Everything: Titles, axis labels, and legends turn a plot into a communication tool. Never skip them.
  • Leverage Seaborn's Power: Use hue, size, and style to encode multiple dimensions in a single plot efficiently.
  • Master Subplots: Multi-panel figures (plt.subplots()) enable powerful comparisons and storytelling with data.
  • Iterate Visually: Visualization is exploratory—create rough plots quickly, then refine based on what the data reveals.
  • Mind the Details: Figure size, color palettes, and tight_layout() make the difference between amateur and professional visualizations.
  • Less is More: Remove unnecessary chart junk (excessive gridlines, 3D effects, too many colors) to focus attention on the data itself.