Week 8: SQL & Databases | Python Course

🗄️ Why SQL Matters

🎯 Learning Objectives

Understand Relational Databases (RDBMS)
Connect to databases using Python's `sqlite3`
Write SQL queries (SELECT, INSERT, WHERE)
Use SQLAlchemy for Object-Relational Mapping (ORM)
Load SQL data directly into Pandas DataFrames

📚 Key Vocabulary

RDBMS Relational Database Management System—a database organized into tables with rows and columns, using SQL for querying (e.g., SQLite, PostgreSQL, MySQL).

SQL Structured Query Language—a standard language for interacting with relational databases (SELECT, INSERT, UPDATE, DELETE).

ORM Object-Relational Mapping—a technique that lets you interact with database tables as if they were Python objects (SQLAlchemy is a popular ORM).

Connection A bridge between your Python script and a database, allowing you to send queries and receive results.

Cursor An object used to execute SQL queries and fetch results from a database connection.

ETL Extract, Transform, Load—a data pipeline process for moving data from sources (databases) to destinations (DataFrames, data warehouses).

🗄️ SQLite Basics

Work with lightweight, file-based databases using Python's built-in sqlite3 module. Perfect for local projects and prototypes.

🔗 SQLAlchemy ORM

Use SQLAlchemy to map database tables to Python classes, enabling object-oriented database operations without raw SQL.

🐼 Pandas + SQL

Load SQL query results directly into Pandas DataFrames with pd.read_sql() for seamless data analysis workflows.

🎯 Analogy: Database as a Library

Database: A library building.

Tables: Different sections (Fiction, Non-Fiction, Reference).

Rows: Individual books on the shelves.

Columns: Book attributes (Title, Author, ISBN, Year).

SQL Queries: Asking the librarian for specific books using a request form.

ORM (SQLAlchemy): Having a personal assistant who translates your plain-English requests into library catalog language.

Pandas Integration: The librarian hands you a spreadsheet (DataFrame) with all the books matching your request, ready for analysis.

💡 SQL + Python = Superpower

Data rarely lives in CSV files in production. It lives in databases. Python gives you the tools to extract, transform, and load (ETL) that data efficiently.

Real-World Applications:

E-commerce: Query product databases to analyze sales trends
Finance: Extract transaction history for fraud detection models
Healthcare: Pull patient records for research studies
Social Media: Analyze user engagement from massive database tables

📓 Practice in Notebook

Open notebook-sessions/week6/session1_sql_databases.ipynb and write a short note on when to choose raw SQL vs ORM. Add a tiny example for each.

💡 Learning Strategy: SQL + Python

Start with SQLite: It's built into Python and requires no setup—perfect for learning SQL fundamentals.

Learn SQL First, ORM Later: Understanding raw SQL makes you appreciate ORMs. Don't skip to SQLAlchemy without knowing basic SELECT and JOIN.

Use DB Browser for SQLite: Visualize your database structure with a GUI tool before writing Python code.

Practice Pandas Integration: pd.read_sql() is your bridge from databases to DataFrames—master it for seamless data pipelines.

Mind Your Connections: Always close database connections (conn.close()) or use context managers (with) to avoid resource leaks.

💾 SQLite Basics

SQLite is a lightweight, file-based database built into Python. No installation required!

Python ❌ Bad Practice SQL Injection Risk

# ❌ BAD: String concatenation for queries
import sqlite3

user_input = "admin'; DROP TABLE users; --"

# DANGEROUS! Allows SQL Injection attacks
query = f"SELECT * FROM users WHERE name = '{user_input}'"

print(f"Executing: {query}")
# Executing: SELECT * FROM users WHERE name = 'admin'; DROP TABLE users; --'

⚠️ SQL Injection

Never insert user input directly into a query string. Malicious users can destroy your database.

🏋️ Exercise: Parameterized Queries

Practice safe query techniques to prevent SQL injection:

Create a users table with columns: id, username, email, role
Write a safe INSERT using ? placeholders for all user-provided values
Create a search function that accepts a username parameter safely
Test with malicious input like '; DROP TABLE users; -- to verify protection
Compare the executed query string when using f-strings vs parameterized queries

Bonus: Create a login function that uses parameterized queries to check username AND password.

Python 🔰 Novice Basic SQLite

# 🔰 NOVICE: Manual connection management
import sqlite3

# Connect to file (or memory)
conn = sqlite3.connect(':memory:')
cursor = conn.cursor()

cursor.execute("CREATE TABLE users (id INT, name TEXT)")
cursor.execute("INSERT INTO users VALUES (1, 'Alice')")
conn.commit()

cursor.execute("SELECT * FROM users")
print(cursor.fetchall())

conn.close()  # Don't forget to close!

Python ⭐ Best Practice Context Managers & Params

# ⭐ BEST PRACTICE: Context Manager + Parameterized Queries
import sqlite3

# Use 'with' to auto-close connection
with sqlite3.connect(':memory:') as conn:
    cursor = conn.cursor()
    
    cursor.execute("CREATE TABLE users (id INT, name TEXT)")
    
    # Safe parameter substitution (?)
    data = (1, 'Alice')
    cursor.execute("INSERT INTO users VALUES (?, ?)", data)
    conn.commit()
    
    # Querying
    cursor.execute("SELECT * FROM users WHERE name=?", ('Alice',))
    print(cursor.fetchall())

🏋️ Exercise: Employee Query

Write SELECT statements with WHERE, ORDER BY, and LIMIT:

Create an employees table with: id, name, department, salary, hire_date
Insert at least 10 employees across different departments
Query: Find all employees in the "Engineering" department
Query: Find employees with salary > 50000, ordered by salary DESC
Query: Get the top 5 highest-paid employees using LIMIT
Query: Find employees hired after a specific date using WHERE

Bonus: Combine multiple conditions with AND/OR operators.

🏋️ Exercise: Multi-Table Queries

Practice INNER JOIN and LEFT JOIN between tables:

Create two tables: employees (id, name, department_id) and departments (id, name, location)
Insert sample data with some employees having valid department_id, some with NULL
Write an INNER JOIN to list employees with their department names
Write a LEFT JOIN to show ALL employees, even those without a department
Add a third table projects (id, name, employee_id) and join all three
Count how many employees are in each department using JOIN + GROUP BY

Bonus: Find departments with no employees using a RIGHT JOIN or subquery.

⚠️ Common Pitfalls: SQLite

SQL Injection Vulnerabilities: Never use f-strings or string concatenation for queries with user input. Always use parameterized queries (cursor.execute(query, params)).
Forgetting to Commit: INSERT, UPDATE, and DELETE require conn.commit() to save changes. Forgetting this means your data changes are lost.
Not Closing Connections: Always conn.close() or use with context managers. Unclosed connections cause resource leaks and lock issues.
Type Mismatches: SQLite has dynamic typing, which can cause unexpected behavior. Be explicit about data types when creating tables.

💬 Discussion Questions

Why is SQL injection considered one of the most dangerous security vulnerabilities? Can you think of real-world examples where this caused damage?
When would you choose SQLite over a full database server like PostgreSQL or MySQL? What are SQLite's limitations?
How does using parameterized queries (? placeholders) protect against SQL injection attacks?
What's the difference between fetchone(), fetchmany(), and fetchall()? When would you use each?

🎮 Practice Playground: Student Database

Challenge: Create a SQLite database school.db with a students table:

Columns: id (INTEGER PRIMARY KEY), name (TEXT), grade (INTEGER), gpa (REAL)
Insert 5 students with different grades and GPAs
Query students with GPA > 3.5
Update a student's grade
Delete a student by ID

Bonus: Use a with statement to automatically close the connection.

📓 Practice in Notebook

Open notebook-sessions/week6/session1_sql_databases.ipynb and create a table, insert rows using ? parameters, and query them. Demonstrate a safe lookup by name.

🔗 SQLAlchemy ORM

Write Python, not SQL. SQLAlchemy maps Python classes to database tables.

Python ⭐ Modern ORM

from sqlalchemy import create_engine, Column, Integer, String
from sqlalchemy.orm import declarative_base, sessionmaker

# 1. Setup Engine
engine = create_engine('sqlite:///:memory:')
Base = declarative_base()

# 2. Define Model (Table)
class User(Base):
    __tablename__ = 'users'
    id = Column(Integer, primary_key=True)
    name = Column(String)

Base.metadata.create_all(engine)

# 3. Interact via Session
Session = sessionmaker(bind=engine)
session = Session()

# Add
new_user = User(name='Alice')
session.add(new_user)
session.commit()

# Query
user = session.query(User).first()
print(f"User: {user.name} (ID: {user.id})")

⚠️ Common Pitfalls: SQLAlchemy

Forgetting to Commit Sessions: Like raw SQL, ORM changes require session.commit(). Without it, your INSERT/UPDATE/DELETE operations are rolled back.
N+1 Query Problem: Accessing relationships in loops causes separate queries for each item. Use joinedload() or subqueryload() to eagerly load relationships.
Not Closing Sessions: Always session.close() or use context managers. Unclosed sessions hold database connections and can exhaust connection pools.
Mixing ORM and Raw SQL: Using session.execute() for raw SQL bypasses ORM tracking. Be careful mixing paradigms—stick to one approach per transaction.

💬 Discussion Questions

What are the advantages of using an ORM like SQLAlchemy over raw SQL? What are the trade-offs in terms of performance and control?
How does SQLAlchemy's declarative base pattern make database schema changes easier to manage across a team?
When would you choose raw SQL queries over SQLAlchemy ORM methods? Give examples of scenarios where each is preferable.
What is the N+1 query problem, and why is it a performance issue in production applications?

🚀 Challenge: Database CRUD App

Build a complete Create, Read, Update, Delete application using SQLite and Python:

Create: Function add_contact(name, email, phone) that inserts a new contact
Read: Function get_all_contacts() that returns all contacts as a list
Read: Function search_contacts(query) that searches by name or email
Update: Function update_contact(id, **kwargs) that updates any field
Delete: Function delete_contact(id) that removes a contact by ID
Use parameterized queries for ALL database operations
Include proper error handling with try/except blocks

Advanced: Add a simple command-line interface (CLI) menu to interact with your CRUD functions.

Extra Credit: Refactor to use SQLAlchemy ORM instead of raw sqlite3.

📓 Practice in Notebook

Open notebook-sessions/week6/session1_sql_databases.ipynb and define a User model, create the schema, insert a user, and query by name using SQLAlchemy.

🐼 Pandas & SQL

The ultimate duo. Read SQL query results directly into a DataFrame.

Python ⭐ High Efficiency

import pandas as pd
import sqlite3

# Setup dummy data
conn = sqlite3.connect(':memory:')
conn.execute("CREATE TABLE data (id INT, name TEXT)")
conn.execute("INSERT INTO data VALUES (1, 'Alice'), (2, 'Bob')")
conn.commit()

# ⭐ READ SQL to DATAFRAME
df = pd.read_sql("SELECT * FROM data", conn)
print(df)

# ⭐ WRITE DATAFRAME to SQL
df['new_col'] = 'Active'
df.to_sql('new_table', conn, if_exists='replace', index=False)

conn.close()

⚠️ Common Pitfalls: Pandas + SQL

Loading Huge Tables: pd.read_sql() loads entire result sets into memory. For large tables, use chunksize parameter or write better WHERE clauses to filter data in SQL first.
if_exists='replace' Dangers: Using to_sql() with if_exists='replace' deletes the entire table. Use 'append' for adding rows or 'fail' to prevent accidental overwrites.
Index Confusion: By default, to_sql() writes the DataFrame index as a column. Set index=False unless you explicitly want the index in your table.
Data Type Mismatches: Pandas infers SQL data types, which can cause issues. Use the dtype parameter in to_sql() to explicitly specify column types.

🏋️ Exercise: Sales Analytics

Use GROUP BY and aggregate functions (COUNT, SUM, AVG) for data analysis:

Create a sales table with: id, product_name, category, quantity, price, sale_date
Insert at least 20 sales records across different products and categories
Query: Calculate total revenue per category using SUM(quantity * price)
Query: Find the average order value (AVG) per product
Query: Count the number of sales per day using GROUP BY sale_date
Query: Find categories with more than 5 sales using HAVING
Load results into a Pandas DataFrame and create a summary report

Bonus: Add MIN/MAX to find the cheapest and most expensive sale per category.

🚀 Challenge: Complete Analytics Dashboard

Build an end-to-end analytics pipeline combining SQL and Pandas:

Create a SQLite database with related tables: customers, orders, order_items, products
Write a complex JOIN query combining all tables to get order details
Use pd.read_sql() to load the joined data into a DataFrame
Calculate: Customer lifetime value, top 10 products by revenue, monthly sales trends
Use Pandas groupby() for additional aggregations not done in SQL
Export your analysis results back to new SQL tables using to_sql()

Advanced: Create a function that generates weekly sales reports automatically.

💬 Discussion Questions

Why is it often better to filter data with SQL WHERE clauses instead of loading everything into Pandas first?
When would you use pd.read_sql_query() vs. pd.read_sql_table()? What are the differences?
How does chunksize in pd.read_sql() help with memory management? Give an example scenario where this is critical.
What are the advantages of using Pandas DataFrames as an intermediate layer between databases and visualization tools?

🎮 Practice Playground: Sales Analysis Pipeline

Challenge: Build a complete ETL pipeline:

Create a SQLite database sales.db with a transactions table (columns: id, product, amount, date)
Insert 20 sample transactions with different products and dates
Use pd.read_sql() to query transactions from the last 7 days
Calculate total sales per product in Pandas
Write the aggregated results back to a new table product_summary using to_sql()

Bonus: Use SQLAlchemy engine instead of raw sqlite3 connection for better compatibility.

📓 Practice in Notebook

Open notebook-sessions/week6/session2_sql_databases_group.ipynb and load a table into a DataFrame with pd.read_sql. Add a new column and write it back using to_sql.

📚 Supplemental Materials: Extended SQL & Related Topics

Deepen your database skills with additional resources and explore how SQL integrates with your broader Python data toolkit.

🎯 Advanced Topics

Original SQL Lesson — Alternative explanations and additional SQL patterns
Advanced Pandas — Complement SQL queries with DataFrame operations
Data Visualization — Visualize query results effectively

📖 Extended Lesson: SQL Databases (Alternate)

For additional SQL coverage and alternative explanations, explore the original lesson materials:

🗄️ Open Extended SQL Lesson →

Topics covered: SQLite fundamentals, SQLAlchemy ORM, Pandas-SQL integration, and best practices.

📖 Related: Advanced Pandas Operations

Many SQL operations have Pandas equivalents. Master both to choose the right tool:

🐼 Open Advanced Pandas Lesson →

Comparison: SQL JOINs ↔ pd.merge(), GROUP BY ↔ df.groupby(), WHERE ↔ boolean filtering

💡 SQL vs Pandas: When to Use Each

Use SQL: Large datasets, persistent storage, complex joins, data integrity constraints
Use Pandas: Exploratory analysis, quick prototyping, complex transformations, visualization
Use Both: Query subset with SQL, analyze in Pandas, write results back to database

📜 SQL Cheat Sheet

SQL Patterns Reference

# ═══ SQLITE3 ═══
import sqlite3
with sqlite3.connect('db.sqlite') as conn:
    cursor = conn.cursor()
    cursor.execute("SELECT * FROM table WHERE id=?", (1,))
    row = cursor.fetchone()

# ═══ PANDAS ═══
import pandas as pd
# Read
df = pd.read_sql("SELECT * FROM table", conn)
# Write
df.to_sql('table_name', conn, if_exists='append', index=False)

# ═══ BASIC SQL SYNTAX ═══
# SELECT col1, col2 FROM table WHERE col1 > 5
# INSERT INTO table (col1) VALUES (val1)
# UPDATE table SET col1 = val1 WHERE id = 1
# DELETE FROM table WHERE id = 1

🎯 Key Takeaways: SQL & Databases

Master SQLite First: Python's built-in sqlite3 is perfect for learning SQL fundamentals without server setup.
Always Use Parameterized Queries: Protect against SQL injection by using ? placeholders instead of f-strings or string concatenation.
Remember to Commit: INSERT, UPDATE, and DELETE require conn.commit() to persist changes—forgetting this loses your data modifications.
ORMs Simplify Complex Apps: SQLAlchemy lets you work with databases using Python objects, making schema changes and relationships easier to manage.
Pandas + SQL = ETL Power: pd.read_sql() and to_sql() create seamless pipelines from databases to DataFrames and back.
Filter in SQL, Not Pandas: Use WHERE clauses to reduce data volume before loading into Pandas—memory and performance depend on it.
Close Your Connections: Always conn.close() or use with statements to avoid resource leaks and connection pool exhaustion.
Understand Data Types: Be explicit about column types when creating tables and writing DataFrames to avoid type mismatch issues.

🧠 Quick Quiz: Week 6 Concepts