🗄️ Why SQL Matters

🎯 Learning Objectives
  • Understand Relational Databases (RDBMS)
  • Connect to databases using Python's `sqlite3`
  • Write SQL queries (SELECT, INSERT, WHERE)
  • Use SQLAlchemy for Object-Relational Mapping (ORM)
  • Load SQL data directly into Pandas DataFrames
📚 Key Vocabulary
RDBMS Relational Database Management System—a database organized into tables with rows and columns, using SQL for querying (e.g., SQLite, PostgreSQL, MySQL).
SQL Structured Query Language—a standard language for interacting with relational databases (SELECT, INSERT, UPDATE, DELETE).
ORM Object-Relational Mapping—a technique that lets you interact with database tables as if they were Python objects (SQLAlchemy is a popular ORM).
Connection A bridge between your Python script and a database, allowing you to send queries and receive results.
Cursor An object used to execute SQL queries and fetch results from a database connection.
ETL Extract, Transform, Load—a data pipeline process for moving data from sources (databases) to destinations (DataFrames, data warehouses).

🗄️ SQLite Basics

Work with lightweight, file-based databases using Python's built-in sqlite3 module. Perfect for local projects and prototypes.

🔗 SQLAlchemy ORM

Use SQLAlchemy to map database tables to Python classes, enabling object-oriented database operations without raw SQL.

🐼 Pandas + SQL

Load SQL query results directly into Pandas DataFrames with pd.read_sql() for seamless data analysis workflows.

🎯 Analogy: Database as a Library

Database: A library building.

Tables: Different sections (Fiction, Non-Fiction, Reference).

Rows: Individual books on the shelves.

Columns: Book attributes (Title, Author, ISBN, Year).

SQL Queries: Asking the librarian for specific books using a request form.

ORM (SQLAlchemy): Having a personal assistant who translates your plain-English requests into library catalog language.

Pandas Integration: The librarian hands you a spreadsheet (DataFrame) with all the books matching your request, ready for analysis.

💡 SQL + Python = Superpower

Data rarely lives in CSV files in production. It lives in databases. Python gives you the tools to extract, transform, and load (ETL) that data efficiently.

Real-World Applications:

  • E-commerce: Query product databases to analyze sales trends
  • Finance: Extract transaction history for fraud detection models
  • Healthcare: Pull patient records for research studies
  • Social Media: Analyze user engagement from massive database tables
📓 Practice in Notebook

Open notebook-sessions/week6/session1_sql_databases.ipynb and write a short note on when to choose raw SQL vs ORM. Add a tiny example for each.

💡 Learning Strategy: SQL + Python

Start with SQLite: It's built into Python and requires no setup—perfect for learning SQL fundamentals.

Learn SQL First, ORM Later: Understanding raw SQL makes you appreciate ORMs. Don't skip to SQLAlchemy without knowing basic SELECT and JOIN.

Use DB Browser for SQLite: Visualize your database structure with a GUI tool before writing Python code.

Practice Pandas Integration: pd.read_sql() is your bridge from databases to DataFrames—master it for seamless data pipelines.

Mind Your Connections: Always close database connections (conn.close()) or use context managers (with) to avoid resource leaks.

💾 SQLite Basics

SQLite is a lightweight, file-based database built into Python. No installation required!

Python ❌ Bad Practice SQL Injection Risk
# ❌ BAD: String concatenation for queries
import sqlite3

user_input = "admin'; DROP TABLE users; --"

# DANGEROUS! Allows SQL Injection attacks
query = f"SELECT * FROM users WHERE name = '{user_input}'"

print(f"Executing: {query}")
# Executing: SELECT * FROM users WHERE name = 'admin'; DROP TABLE users; --'
⚠️ SQL Injection

Never insert user input directly into a query string. Malicious users can destroy your database.

🏋️ Exercise: Parameterized Queries

Practice safe query techniques to prevent SQL injection:

  • Create a users table with columns: id, username, email, role
  • Write a safe INSERT using ? placeholders for all user-provided values
  • Create a search function that accepts a username parameter safely
  • Test with malicious input like '; DROP TABLE users; -- to verify protection
  • Compare the executed query string when using f-strings vs parameterized queries

Bonus: Create a login function that uses parameterized queries to check username AND password.

Python 🔰 Novice Basic SQLite
# 🔰 NOVICE: Manual connection management
import sqlite3

# Connect to file (or memory)
conn = sqlite3.connect(':memory:')
cursor = conn.cursor()

cursor.execute("CREATE TABLE users (id INT, name TEXT)")
cursor.execute("INSERT INTO users VALUES (1, 'Alice')")
conn.commit()

cursor.execute("SELECT * FROM users")
print(cursor.fetchall())

conn.close()  # Don't forget to close!
Python ⭐ Best Practice Context Managers & Params
# ⭐ BEST PRACTICE: Context Manager + Parameterized Queries
import sqlite3

# Use 'with' to auto-close connection
with sqlite3.connect(':memory:') as conn:
    cursor = conn.cursor()
    
    cursor.execute("CREATE TABLE users (id INT, name TEXT)")
    
    # Safe parameter substitution (?)
    data = (1, 'Alice')
    cursor.execute("INSERT INTO users VALUES (?, ?)", data)
    conn.commit()
    
    # Querying
    cursor.execute("SELECT * FROM users WHERE name=?", ('Alice',))
    print(cursor.fetchall())
🏋️ Exercise: Employee Query

Write SELECT statements with WHERE, ORDER BY, and LIMIT:

  • Create an employees table with: id, name, department, salary, hire_date
  • Insert at least 10 employees across different departments
  • Query: Find all employees in the "Engineering" department
  • Query: Find employees with salary > 50000, ordered by salary DESC
  • Query: Get the top 5 highest-paid employees using LIMIT
  • Query: Find employees hired after a specific date using WHERE

Bonus: Combine multiple conditions with AND/OR operators.

🏋️ Exercise: Multi-Table Queries

Practice INNER JOIN and LEFT JOIN between tables:

  • Create two tables: employees (id, name, department_id) and departments (id, name, location)
  • Insert sample data with some employees having valid department_id, some with NULL
  • Write an INNER JOIN to list employees with their department names
  • Write a LEFT JOIN to show ALL employees, even those without a department
  • Add a third table projects (id, name, employee_id) and join all three
  • Count how many employees are in each department using JOIN + GROUP BY

Bonus: Find departments with no employees using a RIGHT JOIN or subquery.

⚠️ Common Pitfalls: SQLite
  • SQL Injection Vulnerabilities: Never use f-strings or string concatenation for queries with user input. Always use parameterized queries (cursor.execute(query, params)).
  • Forgetting to Commit: INSERT, UPDATE, and DELETE require conn.commit() to save changes. Forgetting this means your data changes are lost.
  • Not Closing Connections: Always conn.close() or use with context managers. Unclosed connections cause resource leaks and lock issues.
  • Type Mismatches: SQLite has dynamic typing, which can cause unexpected behavior. Be explicit about data types when creating tables.
💬 Discussion Questions
  1. Why is SQL injection considered one of the most dangerous security vulnerabilities? Can you think of real-world examples where this caused damage?
  2. When would you choose SQLite over a full database server like PostgreSQL or MySQL? What are SQLite's limitations?
  3. How does using parameterized queries (? placeholders) protect against SQL injection attacks?
  4. What's the difference between fetchone(), fetchmany(), and fetchall()? When would you use each?
🎮 Practice Playground: Student Database

Challenge: Create a SQLite database school.db with a students table:

  • Columns: id (INTEGER PRIMARY KEY), name (TEXT), grade (INTEGER), gpa (REAL)
  • Insert 5 students with different grades and GPAs
  • Query students with GPA > 3.5
  • Update a student's grade
  • Delete a student by ID

Bonus: Use a with statement to automatically close the connection.

📓 Practice in Notebook

Open notebook-sessions/week6/session1_sql_databases.ipynb and create a table, insert rows using ? parameters, and query them. Demonstrate a safe lookup by name.

🔗 SQLAlchemy ORM

Write Python, not SQL. SQLAlchemy maps Python classes to database tables.

Python ⭐ Modern ORM
from sqlalchemy import create_engine, Column, Integer, String
from sqlalchemy.orm import declarative_base, sessionmaker

# 1. Setup Engine
engine = create_engine('sqlite:///:memory:')
Base = declarative_base()

# 2. Define Model (Table)
class User(Base):
    __tablename__ = 'users'
    id = Column(Integer, primary_key=True)
    name = Column(String)

Base.metadata.create_all(engine)

# 3. Interact via Session
Session = sessionmaker(bind=engine)
session = Session()

# Add
new_user = User(name='Alice')
session.add(new_user)
session.commit()

# Query
user = session.query(User).first()
print(f"User: {user.name} (ID: {user.id})")
⚠️ Common Pitfalls: SQLAlchemy
  • Forgetting to Commit Sessions: Like raw SQL, ORM changes require session.commit(). Without it, your INSERT/UPDATE/DELETE operations are rolled back.
  • N+1 Query Problem: Accessing relationships in loops causes separate queries for each item. Use joinedload() or subqueryload() to eagerly load relationships.
  • Not Closing Sessions: Always session.close() or use context managers. Unclosed sessions hold database connections and can exhaust connection pools.
  • Mixing ORM and Raw SQL: Using session.execute() for raw SQL bypasses ORM tracking. Be careful mixing paradigms—stick to one approach per transaction.
💬 Discussion Questions
  1. What are the advantages of using an ORM like SQLAlchemy over raw SQL? What are the trade-offs in terms of performance and control?
  2. How does SQLAlchemy's declarative base pattern make database schema changes easier to manage across a team?
  3. When would you choose raw SQL queries over SQLAlchemy ORM methods? Give examples of scenarios where each is preferable.
  4. What is the N+1 query problem, and why is it a performance issue in production applications?
🚀 Challenge: Database CRUD App

Build a complete Create, Read, Update, Delete application using SQLite and Python:

  • Create: Function add_contact(name, email, phone) that inserts a new contact
  • Read: Function get_all_contacts() that returns all contacts as a list
  • Read: Function search_contacts(query) that searches by name or email
  • Update: Function update_contact(id, **kwargs) that updates any field
  • Delete: Function delete_contact(id) that removes a contact by ID
  • Use parameterized queries for ALL database operations
  • Include proper error handling with try/except blocks

Advanced: Add a simple command-line interface (CLI) menu to interact with your CRUD functions.

Extra Credit: Refactor to use SQLAlchemy ORM instead of raw sqlite3.

📓 Practice in Notebook

Open notebook-sessions/week6/session1_sql_databases.ipynb and define a User model, create the schema, insert a user, and query by name using SQLAlchemy.

🐼 Pandas & SQL

The ultimate duo. Read SQL query results directly into a DataFrame.

Python ⭐ High Efficiency
import pandas as pd
import sqlite3

# Setup dummy data
conn = sqlite3.connect(':memory:')
conn.execute("CREATE TABLE data (id INT, name TEXT)")
conn.execute("INSERT INTO data VALUES (1, 'Alice'), (2, 'Bob')")
conn.commit()

# ⭐ READ SQL to DATAFRAME
df = pd.read_sql("SELECT * FROM data", conn)
print(df)

# ⭐ WRITE DATAFRAME to SQL
df['new_col'] = 'Active'
df.to_sql('new_table', conn, if_exists='replace', index=False)

conn.close()
⚠️ Common Pitfalls: Pandas + SQL
  • Loading Huge Tables: pd.read_sql() loads entire result sets into memory. For large tables, use chunksize parameter or write better WHERE clauses to filter data in SQL first.
  • if_exists='replace' Dangers: Using to_sql() with if_exists='replace' deletes the entire table. Use 'append' for adding rows or 'fail' to prevent accidental overwrites.
  • Index Confusion: By default, to_sql() writes the DataFrame index as a column. Set index=False unless you explicitly want the index in your table.
  • Data Type Mismatches: Pandas infers SQL data types, which can cause issues. Use the dtype parameter in to_sql() to explicitly specify column types.
🏋️ Exercise: Sales Analytics

Use GROUP BY and aggregate functions (COUNT, SUM, AVG) for data analysis:

  • Create a sales table with: id, product_name, category, quantity, price, sale_date
  • Insert at least 20 sales records across different products and categories
  • Query: Calculate total revenue per category using SUM(quantity * price)
  • Query: Find the average order value (AVG) per product
  • Query: Count the number of sales per day using GROUP BY sale_date
  • Query: Find categories with more than 5 sales using HAVING
  • Load results into a Pandas DataFrame and create a summary report

Bonus: Add MIN/MAX to find the cheapest and most expensive sale per category.

🚀 Challenge: Complete Analytics Dashboard

Build an end-to-end analytics pipeline combining SQL and Pandas:

  • Create a SQLite database with related tables: customers, orders, order_items, products
  • Write a complex JOIN query combining all tables to get order details
  • Use pd.read_sql() to load the joined data into a DataFrame
  • Calculate: Customer lifetime value, top 10 products by revenue, monthly sales trends
  • Use Pandas groupby() for additional aggregations not done in SQL
  • Export your analysis results back to new SQL tables using to_sql()

Advanced: Create a function that generates weekly sales reports automatically.

💬 Discussion Questions
  1. Why is it often better to filter data with SQL WHERE clauses instead of loading everything into Pandas first?
  2. When would you use pd.read_sql_query() vs. pd.read_sql_table()? What are the differences?
  3. How does chunksize in pd.read_sql() help with memory management? Give an example scenario where this is critical.
  4. What are the advantages of using Pandas DataFrames as an intermediate layer between databases and visualization tools?
🎮 Practice Playground: Sales Analysis Pipeline

Challenge: Build a complete ETL pipeline:

  • Create a SQLite database sales.db with a transactions table (columns: id, product, amount, date)
  • Insert 20 sample transactions with different products and dates
  • Use pd.read_sql() to query transactions from the last 7 days
  • Calculate total sales per product in Pandas
  • Write the aggregated results back to a new table product_summary using to_sql()

Bonus: Use SQLAlchemy engine instead of raw sqlite3 connection for better compatibility.

📓 Practice in Notebook

Open notebook-sessions/week6/session2_sql_databases_group.ipynb and load a table into a DataFrame with pd.read_sql. Add a new column and write it back using to_sql.

📚 Supplemental Materials: Extended SQL & Related Topics

Deepen your database skills with additional resources and explore how SQL integrates with your broader Python data toolkit.

🎯 Advanced Topics
  • Original SQL Lesson — Alternative explanations and additional SQL patterns
  • Advanced Pandas — Complement SQL queries with DataFrame operations
  • Data Visualization — Visualize query results effectively
📖 Extended Lesson: SQL Databases (Alternate)

For additional SQL coverage and alternative explanations, explore the original lesson materials:

🗄️ Open Extended SQL Lesson →

Topics covered: SQLite fundamentals, SQLAlchemy ORM, Pandas-SQL integration, and best practices.

📖 Related: Advanced Pandas Operations

Many SQL operations have Pandas equivalents. Master both to choose the right tool:

🐼 Open Advanced Pandas Lesson →

Comparison: SQL JOINs ↔ pd.merge(), GROUP BY ↔ df.groupby(), WHERE ↔ boolean filtering

💡 SQL vs Pandas: When to Use Each
  • Use SQL: Large datasets, persistent storage, complex joins, data integrity constraints
  • Use Pandas: Exploratory analysis, quick prototyping, complex transformations, visualization
  • Use Both: Query subset with SQL, analyze in Pandas, write results back to database

📜 SQL Cheat Sheet

SQL Patterns Reference
# ═══ SQLITE3 ═══
import sqlite3
with sqlite3.connect('db.sqlite') as conn:
    cursor = conn.cursor()
    cursor.execute("SELECT * FROM table WHERE id=?", (1,))
    row = cursor.fetchone()

# ═══ PANDAS ═══
import pandas as pd
# Read
df = pd.read_sql("SELECT * FROM table", conn)
# Write
df.to_sql('table_name', conn, if_exists='append', index=False)

# ═══ BASIC SQL SYNTAX ═══
# SELECT col1, col2 FROM table WHERE col1 > 5
# INSERT INTO table (col1) VALUES (val1)
# UPDATE table SET col1 = val1 WHERE id = 1
# DELETE FROM table WHERE id = 1
🎯 Key Takeaways: SQL & Databases
  • Master SQLite First: Python's built-in sqlite3 is perfect for learning SQL fundamentals without server setup.
  • Always Use Parameterized Queries: Protect against SQL injection by using ? placeholders instead of f-strings or string concatenation.
  • Remember to Commit: INSERT, UPDATE, and DELETE require conn.commit() to persist changes—forgetting this loses your data modifications.
  • ORMs Simplify Complex Apps: SQLAlchemy lets you work with databases using Python objects, making schema changes and relationships easier to manage.
  • Pandas + SQL = ETL Power: pd.read_sql() and to_sql() create seamless pipelines from databases to DataFrames and back.
  • Filter in SQL, Not Pandas: Use WHERE clauses to reduce data volume before loading into Pandas—memory and performance depend on it.
  • Close Your Connections: Always conn.close() or use with statements to avoid resource leaks and connection pool exhaustion.
  • Understand Data Types: Be explicit about column types when creating tables and writing DataFrames to avoid type mismatch issues.
🧠 Quick Quiz: Week 6 Concepts

1) What's the safest way to insert user input into an SQL query?

2) Which library provides ORM capabilities?

3) How do you load SQL results into a DataFrame?